如何使用SeleniumWebDriver从表中提取URL

2024-04-26 21:15:08 发布

您现在位置:Python中文网/ 问答频道 /正文

您好,我正在尝试从此网页提取表中所有篮球赛事的URL:https://www.oddsportal.com/matches/basketball/20200907/

以下是我的python脚本:

#!/usr/bin/python3
# -*- coding: utf­-8 ­-*-

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

options = Options()
options.headless = True
options.add_argument("window-size=1400,800")
options.add_argument("--no-sandbox")
options.add_argument("--disable-gpu")
options.add_argument("start-maximized")
options.add_argument("enable-automation")
options.add_argument("--disable-infobars")
options.add_argument("--disable-dev-shm-usage")

driver = webdriver.Chrome(options=options)

driver.get("https://www.oddsportal.com/matches/basketball/20200907/")

url_links = [my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//a[contains(@href, '/basketball/')]")))]

print(len(url_links), '\n')

print(url_links, '\n')

driver.close()
driver.quit()

输出为我提供了表和其他表的URL。在我的例子中,我只希望9个URL链接到9个篮球赛事。如何过滤这些URL

谢谢