我正在做一个小的编码项目来帮助学习如何使用webscraping,并决定从一个我喜欢的梦幻足球网站中提取一个表,可以在这里找到https://fantasydata.com/nfl/fantasy-football-leaders?position=1&team=1&season=2018&seasontype=1&scope=1&subscope=1&scoringsystem=2&aggregatescope=1&range=1
当我试图抓取表时,前10行显示正常,但从Brian Hill的行开始,表中的每个值都显示为空。每当我遇到问题时,我都会像往常一样检查网页,希尔后面的行似乎与前面的行遵循相同的结构。任何帮助解决这个问题和解释为什么它会发生在第一位将不胜感激
URLA = 'https://fantasydata.com/nfl/fantasy-football-leaders?position='
URLB = '&team='
URLC = '&season='
URLD = '&seasontype=1&scope=1&subscope=1&scoringsystem=2&aggregatescope=1&range=3'
POSITIONNUMBER = [1,6,7]
TEAMNUMBER = [1]
def buildStatsTable(year):
fullDF = pandas.DataFrame()
fullLength = 0
position = 1
headers = ['Name', 'Team', 'Pos', 'GMS', 'PassingYards', 'PassingTDs', 'PassingINTs',
'RushingYDs', 'RushingTDs', 'ReceivingRECs', 'ReceivingYDs', 'ReceivingTDs',
'FUM LST', 'PPG', 'FPTS']
for team in TEAMNUMBER:
currURL = URLA + str(position)+ URLB + str(team)+URLC+str(year)+URLD
driver = webdriver.Chrome()
driver.get(currURL)
soup = BeautifulSoup(driver.page_source, "lxml")
driver.quit()
tr = soup.findAll('tr', {'role' : 'row'})
length = len(tr)
offset = length/2
maxCap = int((length - 1)/2) + 1
tableList = []
for i, row in enumerate(tr[2:maxCap]):
player = row.get_text().split('\n', 2)[1]
player_row = [value.get_text() for value in tr[int(i + offset + 1)].contents]
tableList.append([player] + player_row)
teamDF = pandas.DataFrame(columns = headers, data = tableList)
fullLength = fullLength + len(tableList)
fullDF = fullDF.append(teamDF)
fullDF.index = list(range(0,fullLength))
return fullDF
falcons = buildStatsTable(2018)
实际结果(只显示了前几列以缩短文章,问题在每一列中都是一致的)
Name Team Pos GMS PassingYards PassingTDs PassingINTs \
0 Matt Ryan ATL QB 16 4924 35 7
1 Julio Jones ATL WR 16 0 0 0
2 Calvin Ridley ATL WR 16 0 0 0
3 Tevin Coleman ATL RB 16 0 0 0
4 Mohamed Sanu ATL WR 16 5 1 0
5 Austin Hooper ATL TE 16 0 0 0
6 Ito Smith ATL RB 14 0 0 0
7 Justin Hardy ATL WR 16 0 0 0
8 Marvin Hall ATL WR 16 0 0 0
9 Logan Paulsen ATL TE 15 0 0 0
10 Brian Hill ATL RB
11 Devonta Freeman ATL RB
12 Russell Gage ATL WR
13 Eric Saubert ATL TE
目前没有回答
相关问题 更多 >
编程相关推荐