如何在Selenium中提取和保存链接

-1 投票
1 回答
28 浏览
提问于 2025-04-12 19:32

作为一个刚接触Selenium的新手,我第一次尝试是想把页面上的所有链接保存下来。我写的代码是从一个例子中拿来的,但我无法让它正常工作。请问我哪里出错了?

这是我的代码:

import selenium
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


url = "https://www.python.org"

driver = webdriver.Chrome()
driver.get(url)
driver.minimize_window()

links = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By. LINK_TEXT, "a")))


for link in links:
    href = link.get_attribute("href")
    if href is not None:
        print(href)

driver.quit()

另外,我该如何保存这些链接,以便在需要的时候可以点击它们呢?

这是我收到的错误信息。

Traceback (most recent call last):
  File "/media/joe-2/Ubuntu-Storage1/Exuma-Snoops/RGD-code/link-test.py", line 21, in <module>
    links = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By. LINK_TEXT, "a")))
  File "/home/joe-2/.local/lib/python3.10/site-packages/selenium/webdriver/support/wait.py", line 105, in until
    raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: 

我的操作系统是Ubuntu 22.04

1 个回答

0

两个问题:

  1. 所有的链接都不可见,它们只是存在于网页的结构中。所以,你需要把 visibility_of_all_elements_located 改成 presence_of_all_elements_located

  2. By. LINK_TEXT, "a" 这个写法也是错的。网页结构中没有值为 a 的链接。你应该把定位方式改成 By.XPATH, "//a",这样可以找到网页结构中所有的链接标签(<a>)。

所以下面的这一行是错误的:

links = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By. LINK_TEXT, "a")))

把它改成下面这样:

links = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//a")))

撰写回答