从多个URL在多个页面上刮取表数据(Python和BeautifulSoup)

2024-04-26 15:03:05 发布

您现在位置:Python中文网/ 问答频道 /正文

这里是新的编码员!我试图从多个URL中提取web表数据。每个URL网页都有一个表,但该表在多个页面中拆分。我的代码只遍历第一个URL的表页,而不遍历其余的表页。所以我只能得到2000年NBA数据的第1-5页,但就到此为止。如何让我的代码提取每年的数据?非常感谢您的帮助

page = 1
year = 2000

while page < 20 and year < 2020:
  base_URL = 'http://www.espn.com/nba/salaries/_/year/{}/page/{}'.format(year,page) 
  response = requests.get(base_URL, headers)


if response.status_code == 200:
    soup = BeautifulSoup(response.content, 'html.parser')
    sal_table = soup.find_all('table', class_ = 'tablehead')
    if len(sal_table) < 2:
        sal_table = sal_table[0]
        with open ('NBA_Salary_2000_2019.txt', 'a') as r:
            for row in sal_table.find_all('tr'):
                for cell in row.find_all('td'):
                    r.write(cell.text.ljust(30))
                r.write('\n')
        page+=1
    else:
        print("too many tables")
else:
    year +=1
    page = 1

Tags: 数据代码urlbaseifresponsepagetable
4条回答

我会考虑使用熊猫作为1)它是{{CD1>}函数(使用引擎盖下的漂亮汤),更容易解析^ {CD2>}标签,2)它可以很容易地直接写入文件。

也就是说,重复20页是一种浪费(例如,第一个赛季之后你只有4页……其余的都是空白的。所以我会考虑添加一些东西,比如一旦它到达空白表,就转到下一个赛季。

import pandas as pd
import requests

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36'}

results = pd.DataFrame()
year = 2000

while year < 2020:
    goToNextPage = True
    page = 1
    while goToNextPage == True:
        base_URL = 'http://www.espn.com/nba/salaries/_/year/{}/page/{}'.format(year,page) 
        response = requests.get(base_URL, headers)
        if response.status_code == 200:
            temp_df = pd.read_html(base_URL)[0]
            temp_df.columns = list(temp_df.iloc[0,:])
            temp_df = temp_df[temp_df['RK'] != 'RK']

            if len(temp_df) == 0:
                goToNextPage = False
                year +=1
                continue


            print ('Aquiring Season: %s\tPage: %s' %(year, page))

            temp_df['Season'] = '%s-%s' %(year-1, year)

            results = results.append(temp_df, sort=False).reset_index(drop=True)

            page+=1


results.to_csv('c:/test/NBA_Salary_2000_2019.csv', index=False)

输出:

print (results.head(25).to_string())
    RK                     NAME                    TEAM       SALARY     Season
0    1      Shaquille O'Neal, C      Los Angeles Lakers  $17,142,000  1999-2000
1    2        Kevin Garnett, PF  Minnesota Timberwolves  $16,806,000  1999-2000
2    3       Alonzo Mourning, C              Miami Heat  $15,004,000  1999-2000
3    4         Juwan Howard, PF      Washington Wizards  $15,000,000  1999-2000
4    5       Scottie Pippen, SF  Portland Trail Blazers  $14,795,000  1999-2000
5    6          Karl Malone, PF               Utah Jazz  $14,000,000  1999-2000
6    7         Larry Johnson, F         New York Knicks  $11,910,000  1999-2000
7    8          Gary Payton, PG     Seattle SuperSonics  $11,020,000  1999-2000
8    9      Rasheed Wallace, PF  Portland Trail Blazers  $10,800,000  1999-2000
9   10            Shawn Kemp, C     Cleveland Cavaliers  $10,780,000  1999-2000
10  11     Damon Stoudamire, PG  Portland Trail Blazers  $10,125,000  1999-2000
11  12      Antonio McDyess, PF          Denver Nuggets   $9,900,000  1999-2000
12  13       Antoine Walker, PF          Boston Celtics   $9,000,000  1999-2000
13  14  Shareef Abdur-Rahim, PF     Vancouver Grizzlies   $9,000,000  1999-2000
14  15        Allen Iverson, SG      Philadelphia 76ers   $9,000,000  1999-2000
15  16            Vin Baker, PF     Seattle SuperSonics   $9,000,000  1999-2000
16  17            Ray Allen, SG         Milwaukee Bucks   $9,000,000  1999-2000
17  18    Anfernee Hardaway, SF            Phoenix Suns   $9,000,000  1999-2000
18  19          Kobe Bryant, SF      Los Angeles Lakers   $9,000,000  1999-2000
19  20      Stephon Marbury, PG         New Jersey Nets   $9,000,000  1999-2000
20  21           Vlade Divac, C        Sacramento Kings   $8,837,000  1999-2000
21  22         Bryant Reeves, C     Vancouver Grizzlies   $8,666,000  1999-2000
22  23        Tom Gugliotta, PF            Phoenix Suns   $8,558,000  1999-2000
23  24        Nick Van Exel, PG          Denver Nuggets   $8,354,000  1999-2000
24  25        Elden Campbell, C       Charlotte Hornets   $7,975,000  1999-2000
...

相关问题 更多 >