我得到一个空的数据框,试图从web上抓取html代码。为什么?

2024-06-07 23:08:24 发布

您现在位置:Python中文网/ 问答频道 /正文

尝试使用Python3.x和pandas从Basketball引用中获取工资数据。我没有收到任何错误消息,但我没有输出。我想要表格中的第二列和第四列:“球员”和工资“2019-20”。我做错了什么

这就是我到目前为止所做的:

# URL page we will scraping
salaries_url = 'https://www.basketball-reference.com/contracts/players.html'
salaries_response = requests.get(salaries_url)
page = salaries_response.text

# this is the HTML from the given URL
soup = BeautifulSoup(html)

#This takes the player salaries data, and creates a list of a lists, where a list is all the values of a player
salaries = []
for x in soup.find_all('tr')[2:]:
    tds_salaries = x.find_all('td')
    name_s = tds_salaries[0].text
    salary = tds_salaries[2].text
    salaries.append([name_s, salary[1:]])

#create a salary pandas dataframe
salaries_df = pd.DataFrame(salaries, columns=['name', 'salary'])

salaries_df.head()


Tags: thetextnameurlpandasisresponsehtml
1条回答
网友
1楼 · 发布于 2024-06-07 23:08:24

这里很好用。我所做的只是在for循环中尝试跳过表头

代码

salaries_url = 'https://www.basketball-reference.com/contracts/players.html'
salaries_response = requests.get(salaries_url)
page = salaries_response.text

soup = BeautifulSoup(page)

salaries = []
for x in soup.find_all('tr')[2:]:
    try:
        tds_salaries = x.find_all('td')
        name_s = tds_salaries[0].text
        salary = tds_salaries[2].text
        salaries.append([name_s, salary[1:]])
    except IndexError:
        print('This is a header!')

salaries_df = pd.DataFrame(salaries, columns=['name', 'salary'])

print(salaries_df)

Outuput

                  name      salary
0        Stephen Curry  40,231,758
1    Russell Westbrook  38,506,482
2           Chris Paul  38,506,482
3            John Wall  38,199,000
4         James Harden  38,199,000
..                 ...         ...
570    Hollis Thompson      50,000
571         Tyler Ulis      50,000
572  Demetrius Jackson      18,312
573    Jordan Caroline       6,000
574    Anthony Bennett       6,000

[575 rows x 2 columns]

相关问题 更多 >

    热门问题