如何用Selenium + Python实现无限滚动

-1 投票

1 回答

38 浏览

提问于 2025-04-14 15:28

我在用Python的Selenium库，想要从这个网页加载整个由JavaScript生成的列表：https://partechpartners.com/companies。页面底部有一个“加载更多”的按钮。

我写的代码是用来点击这个按钮的（目前只点击了一次，我知道我需要扩展它，以便能多次点击，可能需要用到while循环）：

from selenium import webdriver #The Selenium webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import NoSuchElementException, StaleElementReferenceException, WebDriverException
from time import sleep

chrome_options = Options()
chrome_options.add_argument("--headless")

driver = webdriver.Chrome(options=chrome_options)

url = 'https://partechpartners.com/companies'

driver.get(url)

sleep(2)

load_more = driver.find_element('xpath','//*[ text() = "LOAD MORE"]')

sleep(2)

try:
    ActionChains(driver).move_to_element(load_more).click(load_more).perform()
    print("Element was clicked")
except Exception as e:
    print("Element wasn't clicked")

代码运行后返回了Element was clicked。但是，当我在上面脚本的底部添加以下代码时，我只得到了30个项目，这个数量就是如果没有点击按钮时的数量，而且在按钮点击前后，元素的相对XPath是一样的，所以我知道问题不在这里：

len(driver.find_elements('xpath','//h2'))

我还尝试注释掉chrome_options.add_argument("--headless")，看看在不是无头浏览器的情况下是否能正常工作，并跟踪点击。出现了一个接受Cookies的按钮，我无法去掉，但这似乎没关系，因为当我运行上面的脚本时，仍然能返回元素。我该怎么做才能确保webdriver浏览器真的在加载页面呢？

循环控制 javascript 自动化测试网页抓取 webdriver selenium 元素定位无限滚动

1 个回答

只要你在点击后不等待任何东西，你得到的结果都是一样的。

在你的情况下，可以注意一下滚动后发生了什么：

“加载更多”按钮的位置发生了变化
显示的项目数量增加了
当所有项目都加载完毕后，按钮就消失了

所以，在点击后，你可以等待项目数量增加，或者等待按钮位置改变（或者两者都等）

然后在一个 while 循环中重复这个过程，直到“加载更多”按钮消失。

关于位置的例子：

from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
import time

chrome_options = Options()
driver = webdriver.Chrome(options=chrome_options)

def wait_for_element_location_to_be_stable(element):
    initial_location = element.location
    previous_location = initial_location
    start_time = time.time()
    while time.time() - start_time < 1:
        current_location = element.location
        if current_location != previous_location:
            previous_location = current_location
            start_time = time.time()
        time.sleep(0.4)

def get_shadow_root(element):
    return driver.execute_script('return arguments[0].shadowRoot', element)

url = 'https://partechpartners.com/companies'

driver.get(url)
timeout = 20
wait = WebDriverWait(driver, timeout)

#accept consent
shadow_host = wait.until(EC.presence_of_element_located((By.ID, 'usercentrics-root')))
shadow_container = get_shadow_root(shadow_host).find_element(By.CSS_SELECTOR, '[data-testid=uc-app-container]')
WebDriverWait(shadow_container, timeout).until(EC.presence_of_element_located((By.CSS_SELECTOR, '[data-testid=uc-accept-all-button]'))).click()
wait.until(EC.invisibility_of_element_located((By.ID, 'usercentrics-root')))

#scroll logic
load_more_xpath = "//*[text()='LOAD MORE']"
load_more = wait.until(EC.visibility_of_element_located((By.XPATH, load_more_xpath)))

while(len(driver.find_elements(By.XPATH, load_more_xpath)) > 0):
    wait_for_element_location_to_be_stable(load_more)
    ActionChains(driver).move_to_element(load_more).click(load_more).perform()

titles = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '[id*=id-] h2')))
for title in titles:
    print(title.text)

回答于 2025-04-14 由 Python大师

分享举报

如何用Selenium + Python实现无限滚动

1 个回答

撰写回答