很难模拟Selenium中的点击,然后在cli之后抓取新页面的数据

2024-06-16 10:37:08 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图模拟从这个页面(http://www.oddsportal.com/baseball/usa/mlb/results/)到在底部找到的最后一个页码的点击。我在代码中对图标使用的单击似乎可以工作,但在模拟此单击之后,我无法让它刮取我想要的实际页面数据。相反,它只是从第一个原始url中刮取数据。在此方面的任何帮助都将不胜感激。你知道吗

from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup

url='http://www.oddsportal.com/baseball/usa/mlb/results/'

driver = webdriver.Chrome() 
driver.get(url)

timeout=5

while True:
    try:
        element_present = EC.presence_of_element_located((By.LINK_TEXT, '»|'))
        WebDriverWait(driver, timeout).until(element_present)

        last_page_link = driver.find_element_by_link_text('»|')
        last_page_link.click()


        element_present2 = EC.presence_of_element_located((By.XPATH, ".//th[@class='first2 tl']"))
        WebDriverWait(driver, timeout).until(element_present2)

        content=driver.page_source

        soup=BeautifulSoup(content,'lxml')

        dates2 = soup.find_all('th',{'class':'first2'})
        dates2 = [element.text for element in dates2]
        dates2=dates2[1:]

        driver.quit()
    except TimeoutException:
        print('Timeout Error!')
        driver.quit()
        continue
    break
print(dates2)

Tags: fromimporturlbydriverseleniumpagetimeout