美丽的汤和Pandas,如何为数据框列赋值

2024-04-29 00:05:38 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试使用BeautifulSoup从espn中提取数据

但是,我无法将值指定给正确的列。我提取的某些值正在将数据移动到不同的列中

将数据打印到excel时,值未正确对齐。他们为正确的列赋值的方式是否正确

任何帮助都将不胜感激

代码:

from bs4 import BeautifulSoup
import pandas as pd
import requests

       
url = 'https://www.espn.com/nba/player/gamelog/_/id/3012/kyle-lowry'

page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')


columns=[
 'Date',
 'OPP',
 'Result',
 'MIN',
 'FG',
 'FG%',
 '3PT',
 '3P%',
 'FT',
 'FT%',
 'REB',
 'AST',
 'BLK',
 'STL',
 'PF',
 'TO',
 'PTS',
 ]

# create dataframe
d1 = pd.DataFrame(columns=columns)


full = []

for data in soup.find_all('td', attrs = {'class': 'Table__TD'}):
    val = data.get_text()
    full.append(val)
    
# seperate full list into sub-lists with 17 elements
rows = [full[i: i+17] for i in range(0, len(full), 17)]

# append list of lists structure to dataframe
d1 = d1.append(pd.DataFrame(rows, columns=d1.columns))


print(d1)
d1.to_csv('C:\\Users\\Jonathan\\test7.csv')

Tags: columns数据textimporturlgetpagerequests
1条回答
网友
1楼 · 发布于 2024-04-29 00:05:38

您可以尝试此脚本以正确加载数据:

import requests
import pandas as pd
from bs4 import BeautifulSoup


url = 'https://www.espn.com/nba/player/gamelog/_/id/3012/kyle-lowry'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
columns = ['Date','OPP','Result','MIN','FG','FG%','3PT','3P%','FT','FT%','REB','AST','BLK','STL','PF','TO', 'PTS']

all_data = []
for row in soup.select('.Table__TR'):
    tds = [td.get_text(strip=True, separator=' ') for td in row.select('.Table__TD')]
    if len(tds) != 17:
        continue
    all_data.append(tds)

df = pd.DataFrame(all_data, columns=columns)
print(df)
df.to_csv('data.csv')

印刷品:

         Date     OPP        Result MIN     FG   FG%  ... AST BLK STL PF TO PTS
0    Wed 8/19  vs BKN      W 104-99  37   7-14  50.0  ...   3   1   2  4  2  21
1    Mon 8/17  vs BKN     W 134-110  38   3-14  21.4  ...   6   1   0  2  0  16
2    Wed 8/12   @ PHI     W 125-121  25   6-11  54.5  ...   3   0   2  4  0  19
3     Sun 8/9  vs MEM      W 108-99  37   4-12  33.3  ...   8   0   4  3  9  15
4     Fri 8/7  vs BOS     L 122-100  28    3-6  50.0  ...   2   1   3  3  3  11
..        ...     ...           ...  ..    ...   ...  ...  ..  ..  .. .. ..  ..
57  Mon 10/28  vs ORL      W 104-95  38   7-18  38.9  ...   6   0   0  2  2  26
58  Sat 10/26   @ CHI      W 108-84  34   4-11  36.4  ...   8   1   1  2  2  11
59  Fri 10/25   @ BOS     L 112-106  40  11-18  61.1  ...   7   1   1  5  4  29
60  Tue 10/22   vs NO  W 130-122 OT  45   4-15  26.7  ...   6   0   2  4  4  22
61  Fri 10/18   @ BKN     W 123-107  26   3-12  25.0  ...   5   0   1  0  3   9

[62 rows x 17 columns]

并保存data.csv(LibreOffice Calc的屏幕截图):

enter image description here

相关问题 更多 >