从web中删除表会忽略某些值

2024-06-16 10:03:27 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在做一个小的编码项目来帮助学习如何使用webscraping,并决定从一个我喜欢的梦幻足球网站中提取一个表,可以在这里找到https://fantasydata.com/nfl/fantasy-football-leaders?position=1&team=1&season=2018&seasontype=1&scope=1&subscope=1&scoringsystem=2&aggregatescope=1&range=1

当我试图抓取表时,前10行显示正常,但从Brian Hill的行开始,表中的每个值都显示为空。每当我遇到问题时,我都会像往常一样检查网页,希尔后面的行似乎与前面的行遵循相同的结构。任何帮助解决这个问题和解释为什么它会发生在第一位将不胜感激

URLA = 'https://fantasydata.com/nfl/fantasy-football-leaders?position='
URLB = '&team='
URLC = '&season='
URLD = '&seasontype=1&scope=1&subscope=1&scoringsystem=2&aggregatescope=1&range=3'
POSITIONNUMBER = [1,6,7]


TEAMNUMBER = [1]


def buildStatsTable(year):
    fullDF = pandas.DataFrame()
    fullLength = 0
    position = 1
    headers = ['Name', 'Team', 'Pos', 'GMS', 'PassingYards', 'PassingTDs', 'PassingINTs',
               'RushingYDs', 'RushingTDs', 'ReceivingRECs', 'ReceivingYDs', 'ReceivingTDs', 
               'FUM LST', 'PPG', 'FPTS']
    for team in TEAMNUMBER:
        currURL = URLA + str(position)+ URLB + str(team)+URLC+str(year)+URLD
        driver = webdriver.Chrome()
        driver.get(currURL)
        soup = BeautifulSoup(driver.page_source, "lxml")
        driver.quit()
        tr = soup.findAll('tr', {'role' : 'row'})
        length = len(tr)
        offset = length/2
        maxCap = int((length - 1)/2) + 1
        tableList = []
        for i, row in enumerate(tr[2:maxCap]):
            player = row.get_text().split('\n', 2)[1]
            player_row = [value.get_text() for value in tr[int(i + offset + 1)].contents]
            tableList.append([player] + player_row)
        teamDF = pandas.DataFrame(columns = headers, data = tableList)
        fullLength = fullLength + len(tableList)
        fullDF = fullDF.append(teamDF)
    fullDF.index = list(range(0,fullLength))
    return fullDF

falcons = buildStatsTable(2018)

实际结果(只显示了前几列以缩短文章,问题在每一列中都是一致的)

 Name Team Pos GMS PassingYards PassingTDs PassingINTs  \
0         Matt Ryan  ATL  QB  16         4924         35           7   
1       Julio Jones  ATL  WR  16            0          0           0   
2     Calvin Ridley  ATL  WR  16            0          0           0   
3     Tevin Coleman  ATL  RB  16            0          0           0   
4      Mohamed Sanu  ATL  WR  16            5          1           0   
5     Austin Hooper  ATL  TE  16            0          0           0   
6         Ito Smith  ATL  RB  14            0          0           0   
7      Justin Hardy  ATL  WR  16            0          0           0   
8       Marvin Hall  ATL  WR  16            0          0           0   
9     Logan Paulsen  ATL  TE  15            0          0           0   
10       Brian Hill  ATL  RB                                           
11  Devonta Freeman  ATL  RB                                           
12     Russell Gage  ATL  WR                                           
13     Eric Saubert  ATL  TE

Tags: infordriverpositionrangewrtrteam