如何在Selenium中提取和保存链接
作为一个刚接触Selenium的新手,我第一次尝试是想把页面上的所有链接保存下来。我写的代码是从一个例子中拿来的,但我无法让它正常工作。请问我哪里出错了?
这是我的代码:
import selenium
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = "https://www.python.org"
driver = webdriver.Chrome()
driver.get(url)
driver.minimize_window()
links = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By. LINK_TEXT, "a")))
for link in links:
href = link.get_attribute("href")
if href is not None:
print(href)
driver.quit()
另外,我该如何保存这些链接,以便在需要的时候可以点击它们呢?
这是我收到的错误信息。
Traceback (most recent call last):
File "/media/joe-2/Ubuntu-Storage1/Exuma-Snoops/RGD-code/link-test.py", line 21, in <module>
links = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By. LINK_TEXT, "a")))
File "/home/joe-2/.local/lib/python3.10/site-packages/selenium/webdriver/support/wait.py", line 105, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
我的操作系统是Ubuntu 22.04
1 个回答
0
两个问题:
所有的链接都不可见,它们只是存在于网页的结构中。所以,你需要把
visibility_of_all_elements_located
改成presence_of_all_elements_located
。By. LINK_TEXT, "a"
这个写法也是错的。网页结构中没有值为a
的链接。你应该把定位方式改成By.XPATH, "//a"
,这样可以找到网页结构中所有的链接标签(<a>
)。
所以下面的这一行是错误的:
links = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By. LINK_TEXT, "a")))
把它改成下面这样:
links = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//a")))