如何检查Selenium Webdriver中的点击是否在Python中改变了页面

1 投票
1 回答
2856 浏览
提问于 2025-04-18 15:32

我看到有人说,使用Selenium的webdrivers进行点击操作是异步的,所以我一直在尝试让webdriver等点击操作完成后再做其他事情。我现在用的是PhantomJS作为浏览器。

我有WebDriverWait对象来等待页面上的某个元素发生变化(这就是我判断页面在点击后是否加载或改变的方法)。我的问题是,我总是收到WebDriverWait的超时异常。

有没有什么办法可以在我点击某个东西后等待页面加载完成?我不想用time.sleep(1),因为加载时间似乎不固定,我不想让它睡得太久。这就是我想明确等待页面加载的原因。

这是我用来设置webdriver和等待的代码:

import time
from bs4 import BeautifulSoup
from selenium import webdriver
import selenium.webdriver.support.ui as ui
import selenium.common.exceptions as exceptions

class Webdriver():

    def __init__(self, wait_time=10):
        self.driver = webdriver.PhantomJS()
        self.driver.set_window_size(1200,800)
        self.wait = wait_time

    def click(self, element_xpath, wait_xpath, sleep_time=0):
        wait = ui.WebDriverWait(self.driver, self.wait)
        old_element = self.driver.find_element_by_xpath(wait_xpath)
        old_text = old_element.text
        self.driver.find_element_by_xpath(element_xpath).click()
        wait.until(lambda driver: element_changed(driver, wait_xpath, old_text,20))
        time.sleep(sleep_time)

def element_changed(driver, element_xpath, old_element_text, timeout_seconds=10):
    pause_interval = 1
    t0 = time.time()
    while time.time() - t0 < timeout_seconds:
        try:
            element = driver.find_element_by_xpath(element_xpath)
            if element.text != old_element_text:
                return True
        except exceptions.StaleElementReferenceException:
            return True
        except exceptions.NoSuchElementException:
            pass
        time.sleep(pause_interval)
    return False

这是运行的代码:

driver = Webdriver()
url = 'http://www.atmel.com/products/microcontrollers/avr/default.aspx?tab=parameters'
wait_xpath = '//*[@id="device-columns"]/tbody/tr[2]/td[1]/div[2]/a'
driver.load(url, wait_xpath)
soup = driver.get_soup()

pages = soup('ul', class_='pagination')[0]('a')
num_pages = len(pages)
products = set()
for i in range(num_pages):
    element_xpath = '//*[@id="top-nav"]/div/ul/li[%d]/a' % (2 + i)
    driver.click(element_xpath, wait_xpath)
    soup = driver.get_soup()
    for tag in soup('td', class_='first-cell'):
        product = tag.find('div', class_='anchor')
        if not product:
            continue
        else:
            if product.find('a'):
                products.add(product.find('a')['href'])

更新

我问题的一部分是我在重新加载第一页时期待它会改变。但即便如此,把点击操作和解析操作放到for循环之后,有时候还是会花太长时间才能改变。

1 个回答

1

我没有使用WebDriverWait,而是自己写了一个函数来等待页面加载完成。现在看起来是可以用的,但我总觉得这样不太稳定,有时候可能会不管用。

def click(self, element_xpath, wait_xpath=None, sleep_time=0):
    if wait_xpath:
        old_element = self.driver.find_element_by_xpath(wait_xpath)
        old_text = old_element.text
    self.driver.find_element_by_xpath(element_xpath).click()
    if wait_xpath:
        if not element_changed(self.driver, wait_xpath, old_text):
            log.warn('click did not change element at %s', wait_xpath)
            return False
    time.sleep(sleep_time)
    return True

def element_changed(driver, element_xpath, old_element_text, timeout_seconds=10):
    pause_interval = 1
    t0 = time.time()
    while time.time() - t0 < timeout_seconds:
        try:
            element = driver.find_element_by_xpath(element_xpath)
            if element.text != old_element_text:
                return True
        except exceptions.StaleElementReferenceException:
            return True
        except exceptions.NoSuchElementException:
            pass
        time.sleep(pause_interval)
    return False

运行的代码是这个:

driver = Webdriver()
url = 'http://www.atmel.com/products/microcontrollers/avr/default.aspx?tab=parameters'
wait_xpath = '//*[@id="device-columns"]/tbody/tr[2]/td[1]/div[2]/a'
driver.load(url, wait_xpath)
soup = driver.get_soup()

pages = soup('ul', class_='pagination')[0]('a')
num_pages = len(pages)
products = set()
for i in range(num_pages):
    element_xpath = '//*[@id="top-nav"]/div/ul/li[%d]/a' % (2 + i)
    if i == 0:
        driver.click(element_xpath, None, 1)
    else:
        driver.click(element_xpath, wait_xpath, 1)
    soup = driver.get_soup()
    for tag in soup('td', class_='first-cell'):
        product = tag.find('div', class_='anchor')
        if not product:
            continue
        else:
            if product.find('a'):
                products.add(product.find('a')['href'])

撰写回答