使用静态u刮取多个页面

url = 'https://ethnicelebs.com/all-celeb' driver = webdriver.Chrome() driver.get(url) while True: page = requests.post('https://ethnicelebs.com/all-celebs') soup = BeautifulSoup(page.text, 'html.parser') for href in soup.find_all('a', href=True)[18:]: print('Found the URL:{}'.format(href['href'])) request_href = requests.get(href['href']) soup2 = BeautifulSoup(request_href.content) for each in soup2.find_all('strong')[:-1]: print(each.text) Next_button = (By.XPATH, "//*[@title='Go to next page']") WebDriverWait(driver, 50).until(EC.element_to_be_clickable(Next_button)).click() url = driver.current_url time.sleep(5)

1条回答

网友

1楼 · 发布于 2024-04-25 00:04:05

由于上一个答案中的嵌套循环，我误解了你的问题。以下代码将起作用：

url = 'https://ethnicelebs.com/all-celeb'
driver = webdriver.Chrome()
while True:
    driver.get(url)
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    for href in soup.find_all('a', href=True)[18:]:
        print('Found the URL:{}'.format(href['href']))
        driver.get(href['href'])
        soup2 = BeautifulSoup(driver.page_source)
        for each in soup2.find_all('strong')[:-1]:
            print(each.text)

    Next_button = (By.XPATH, "//*[@title='Go to next page']")
    WebDriverWait(driver, 50).until(EC.element_to_be_clickable(Next_button)).click()
    url = driver.current_url
    time.sleep(5)

在您的代码中，您只在开始时通过selenium发送一次请求，然后稍后使用requests。要同时导航和刮取一个页面，应该只使用上面示例中的selenium。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章