Python没有检索所有匹配的类元素

2024-04-18 05:50:39 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用Python Selenium Webdriver从以下站点获取一些信息:http://www.ukathletics.com/schedule-list/#!/m-basebl/2016

我有兴趣拉一些链接,日期和团队名称。我已经编写了下面的代码来标识我要查找的正确信息,但是它似乎只在某个点上获取信息,然后将空项附加到我的列表中(即“.”)

我知道所有的名单应该有66个项目,如果拉正确(肯塔基州打了66场比赛)。你知道为什么在第二场路易斯安那州立大学的比赛后,它停止提取信息了吗

bs = [] #boxscores
team2 = [] #opponents
dates = [] #dates of games
team1 = 'KENTUCKY' #team of interest

driver = webdriver.Chrome()
driver.get('http://www.ukathletics.com/schedule-list/#!/m-basebl/2016')

elem = driver.find_elements_by_class_name('event_link')
for i in elem:
    bs.append(i.get_attribute('href'))
links = sorted(set(bs), key=lambda x: bs.index(x))

elem = driver.find_elements_by_class_name('school_name')
team2 = [i.text for i in elem if i.text!=team1]

elem = driver.find_elements_by_class_name('date')
for i in elem:
    dates.append(i.text.replace(',','').replace('\n',' '))

print(links)
print(team2)
print(dates)
print(len(links))
print(len(team2))
print(len(dates))

我的结果:

['http://www.ukathletics.com/game-center/580644ebe4b07dac0ca58a91/', 'http://www.ukathletics.com/game-center/5806455ce4b07dac0ca58a92/', 'http://www.ukathletics.com/game-center/58064594e4b09266491b651d/', 'http://www.ukathletics.com/game-center/5820d9dbe4b0493932cf30fd/', 'http://www.ukathletics.com/game-center/5820da33e4b0493932cf30fe/', 'http://www.ukathletics.com/game-center/5820da86e4b05e67c64470ca/', 'http://www.ukathletics.com/game-center/5820dabde4b0493932cf30ff/', 'http://www.ukathletics.com/game-center/5820daf4e4b05e67c64470cb/', 'http://www.ukathletics.com/game-center/5820db25e4b05e67c64470cc/', 'http://www.ukathletics.com/game-center/5820db6ce4b0493932cf3100/', 'http://www.ukathletics.com/game-center/5820db91e4b05e67c64470de/', 'http://www.ukathletics.com/game-center/5820dbb6e4b05e67c64470df/', 'http://www.ukathletics.com/game-center/5820dbe3e4b0493932cf3101/', 'http://www.ukathletics.com/game-center/5820dc0de4b05e67c64470e0/', 'http://www.ukathletics.com/game-center/58c1e98ee4b066e02ca82086/', 'http://www.ukathletics.com/game-center/5820dc32e4b05e67c64470e1/', 'http://www.ukathletics.com/game-center/5820dc80e4b0493932cf3102/', 'http://www.ukathletics.com/game-center/5820dcaae4b0493932cf3103/', 'http://www.ukathletics.com/game-center/5820dd1ee4b0493932cf3104/', 'http://www.ukathletics.com/game-center/5820dd6fe4b0493932cf3105/', 'http://www.ukathletics.com/game-center/5820dd8ce4b05e67c64470e3/', 'http://www.ukathletics.com/game-center/5820de21e4b05e67c64470e4/', 'http://www.ukathletics.com/game-center/5820de47e4b0493932cf3106/', 'http://www.ukathletics.com/game-center/5820de69e4b05e67c64470e5/', 'http://www.ukathletics.com/game-center/5820de87e4b0493932cf3107/', 'http://www.ukathletics.com/game-center/5820dea9e4b05e67c64470e6/', 'http://www.ukathletics.com/game-center/5820decee4b0493932cf3108/', 'http://www.ukathletics.com/game-center/5820deebe4b05e67c64470e7/', 'http://www.ukathletics.com/game-center/5820df0ce4b05e67c64470e8/', 'http://www.ukathletics.com/game-center/5820df50e4b0493932cf3114/', 'http://www.ukathletics.com/game-center/5820df85e4b05e67c64470e9/', 'http://www.ukathletics.com/game-center/5820dfa9e4b05e67c64470ea/', 'http://www.ukathletics.com/game-center/5820dfc7e4b05e67c64470eb/', 'http://www.ukathletics.com/game-center/5820dfebe4b0493932cf3115/', 'http://www.ukathletics.com/game-center/5820e023e4b0493932cf3116/', 'http://www.ukathletics.com/game-center/5820e03ee4b0493932cf3117/', 'http://www.ukathletics.com/game-center/5820e056e4b0493932cf3118/', 'http://www.ukathletics.com/game-center/5820e089e4b0493932cf3119/', 'http://www.ukathletics.com/game-center/5820e0bee4b05e67c64470ed/', 'http://www.ukathletics.com/game-center/5820e0a4e4b05e67c64470ec/']
['NORTH CAROLINA', 'NORTH CAROLINA', 'NORTH CAROLINA', 'LIBERTY', "ST. JOSEPH'S", 'OLD DOMINION', 'DELAWARE', 'E. KENTUCKY', 'WKU', 'UC SANTA BARBARA', 'UC SANTA BARBARA', 'UC SANTA BARBARA', 'WRIGHT STATE', 'CINCINNATI', 'MIAMI (OH)', 'MIAMI (OH)', 'MIAMI (OH)', 'MURRAY STATE', 'TEXAS A&M', 'TEXAS A&M', 'TEXAS A&M', 'WKU', 'OLE MISS', 'OLE MISS', 'OLE MISS', 'CINCINNATI', 'VANDERBILT', 'VANDERBILT', 'VANDERBILT', 'LOUISVILLE', 'MISSISSIPPI STATE', 'MISSISSIPPI STATE', 'MISSISSIPPI STATE', 'UT MARTIN', 'MIZZOU', 'MIZZOU', 'MIZZOU', 'LOUISVILLE', 'LSU', 'LSU', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
['FRI FEB 17', 'SAT FEB 18', 'SUN FEB 19', 'WED FEB 22', 'FRI FEB 24', 'SAT FEB 25', 'SUN FEB 26', 'TUE FEB 28', 'WED MAR 1', 'FRI MAR 3', 'SAT MAR 4', 'SUN MAR 5', 'TUE MAR 7', 'WED MAR 8', 'THU MAR 9', 'FRI MAR 10', 'SUN MAR 12', 'TUE MAR 14', 'FRI MAR 17', 'SAT MAR 18', 'SUN MAR 19', 'TUE MAR 21', 'THU MAR 23', 'FRI MAR 24', 'SAT MAR 25', 'TUE MAR 28', 'FRI MAR 31', 'SAT APR 1', 'SUN APR 2', 'TUE APR 4', 'FRI APR 7', 'SAT APR 8', 'SUN APR 9', 'WED APR 12', 'FRI APR 14', 'SAT APR 15', 'SUN APR 16', 'TUE APR 18', 'FRI APR 21', 'FRI APR 21', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
40
120
80

Tags: comgamehttpwwwsataprmarfeb
1条回答
网友
1楼 · 发布于 2024-04-18 05:50:39

实际上,所有的元素都没有被提取,因为它们没有被加载。如果您仔细观察表格的底部元素,只有在页面末尾向下滚动时才会加载

您可以尝试在打开页面后添加以下代码,以便加载完整的表

driver = webdriver.Chrome()
driver.get('http://www.ukathletics.com/schedule-list/#!/m-basebl/2016')
time.sleep(5)
driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + Keys.END)
time.sleep(5)
driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL  +Keys.END)
  • 添加了等待页面加载
  • 向下滚动使用两次,以确保表的实际底部加载,以防较长的长度

我已经测试了它并给出了以下输出:

66    #print(len(links))
198   #print(len(team2))
132   #print(len(dates))

相关问题 更多 >