Web scraper无法单击分页按钮

# import libraries import urllib.request from bs4 import BeautifulSoup from selenium import webdriver import time import pandas as pd from selenium.common.exceptions import NoSuchElementException from selenium.webdriver.firefox.firefox_binary import FirefoxBinary # specify the url urlpage = 'https://www.ebay.com/b/Nike-Athletic-Apparel-for-Women/185082/bn_648725?rt=nc&LH_Sold=1' print(urlpage) # run firefox webdriver from executable path of your choice driver = webdriver.Firefox() # get web page driver.get(urlpage) for page_num in range(0, 2): parentElement = driver.find_element_by_class_name("s-item") results = parentElement.find_elements_by_css_selector("*") # all children by CSS #button = driver.find_elements_by_class_name('ebayui-pagination__control') # not working #button = driver.find_elements_by_xpath('//html/body/div[3]/div[3]/div[4]/section[1]/div[2]/nav/a[2]/span/svg[2]/use') # not working button.click() print('Number of results', len(results)) for r in results: print(r.text) df = pd.DataFrame(results) df.head() df.to_csv('eBay_scrape.csv') driver.quit()

https://www.ebay.com/b/Nike-Athletic-Apparel-for-Women/185082/bn_648725?rt=nc&LH_Sold=1 --------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-2-58b4e0e554fc> in <module> 19 #results = parentElement.find_elements_by_tag_name("li") # not working... 20 #results = driver.find_elements_by_class_name("vip") # 50 results per page. But useless... ---> 21 button = driver.find_elements_by_class_name('ebayui-pagination__control') 22 #button = driver.find_elements_by_xpath('//html/body/div[3]/div[3]/div[4]/section[1]/div[2]/nav/a[2]/span/svg[2]/use') IndexError: list index out of range

3条回答

网友

1楼 · 编辑于 2024-04-20 13:09:16

driver.find_elements_by_class_name('ebayui-pagination__control')返回一个列表

该页面上有两个按钮与该类相关-要检查，请在Firefox控制台中键入：$$('.ebayui-pagination__control')

所以你需要： button = driver.find_elements_by_class_name('ebayui-pagination__control')[1]获取第二个按钮。你知道吗

第二种方法（通过xpath查找元素）看起来非常脆弱，因为xpath很长，只需要一个数组在该路径中发生变化，即使您一开始就让它工作，它也不再工作。你知道吗

网友

2楼 · 编辑于 2024-04-20 13:09:16

你可以诱导WebDriverWait和element_to_be_clickable和xpath。你知道吗

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.XPATH,"//a[@class='ebayui-pagination__control'][@rel='next']"))).click()

网友

3楼 · 编辑于 2024-04-20 13:09:16

你不需要通过代码点击下一页按钮，只需要更新你的抓取网址。你知道吗

如果您注意到，&_pgn=<page_number>会附加到后续页面的url字符串中。您可以简单地刮取一页并增加页码，直到没有有效的页码为止。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章