如果网站更改了文本的位置，则使用selenium和python从页面获取文本

3条回答

网友

1楼 · 编辑于 2024-04-28 14:48:52

是的，您可能需要引入xpath轴：

XPATH:

//strong[text()='Last matches']/ancestor::div[contains(@class,'component-header no-margin')]/../following-sibling::div[1]/descendant::table/descendant::td[5]/div/child::div[2]/div

阅读有关xpath轴here的更多信息

网友

2楼 · 编辑于 2024-04-28 14:48:52

这是针对您的两个链接进行的测试

问题是HTML中有两个单独的表（左表和右表）用于Last matches。为了得到所有的结果，您需要对它们进行迭代。我使用下面的f-string使xpath成为动态的，因为两个表的xpath完全相同，只有括号[]之间有一个数字

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

from selenium.webdriver.support.wait import WebDriverWait

driver = webdriver.Chrome()
driver.get("https://s5.sir.sportradar.com/sports4africa/en/1/season/82128/headtohead/613958/33714/match/27197856")

tables = [1,2]
results = []
for table in tables:
    last_match_table = f"(//table[@class='table'])[{table}]//tbody/tr"
    scores = WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.XPATH,(last_match_table))))
    for score in scores:
        results.append(score.get_attribute("innerText"))

for row in results:
    text_split = row.split()
    final = ' '.join(text_split[4:])
    print(final)

注意，我还使用了更通用的xPath。当DOM中发生更改时（如您所见），这不会受到影响。给定此路径//table[@class='table']，该页面上有4个表，2个表用于Last matches，2个表用于Next matches，因此我们只希望以前2个表为目标，因此动态迭代列表tables = [1,2]，以填充XPath

结果:

Bolivar 2:0 CD Real Tomayapo
CD Real Tomayapo 2:1 Blooming
Guabira 0:2 CD Real Tomayapo
CD Real Tomayapo 0:0 Real Potosi
Royal Pari 4:2 CD Real Tomayapo
CD Real Tomayapo 1:0 Always Ready
Aurora 3:0 Independiente Petrolero
Aurora 1:1 Bolivar
Blooming 1:0 Aurora
Aurora 2:1 Guabira
Real Potosi 1:1 Aurora
Aurora 0:8 Royal Pari

网友

3楼 · 编辑于 2024-04-28 14:48:52

另一个好的选择是在xpath中使用ancestor。我将主定位器绑定到表名，这样会更可靠

使用它，您可以找到其他定位器及其文本。只要用正确的路径将它们放入循环即可。在子xpath中.//td表示元素名为td.的主定位器的直接子级

我的解决方案：

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By


url = 'https://s5.sir.sportradar.com/sports4africa/en/1/season/80526/headtohead/334075/340986/match/27195664'
driver = webdriver.Chrome(executable_path='/snap/bin/chromium.chromedriver')
driver.get(url)
driver.implicitly_wait(10)
WebDriverWait(driver, 15).until(EC.presence_of_all_elements_located((By.XPATH, "//strong[text()='Last matches']/ancestor::div[6]//tbody/tr")))
rows= driver.find_elements_by_xpath("//strong[text()='Last matches']/ancestor::div[6]//tbody/tr")
output = []
for res in rows:
    score = res.find_element_by_xpath(".//td[5]//div[@class=' no-wrap']").get_attribute("innerText")
    output.append(score)
print(output)

输出：
第一link： ['0:4'，'3:4'，'2:2'，'0:1'，'3:0'，'2:2'，'0:4'，'1:0'，'2:1'，'1:1'，'1:2'，'2:4']

第二link： ['2:0'，'2:1'，'0:2'，'0:0'，'4:2'，'1:0'，'3:0'，'1:1'，'1:0'，'2:1'，'1:1'，'0:8']

更新： 我能做的交换分数的最快方法是分别获得两个分数，将它们放在一个单独的列表中，然后使用zip交换。结果是两个元组列表

first_score = []
second_score = []
for res in rows:
    first = res.find_element_by_xpath(".//td[5]//div[@class=' no-wrap']/div[1]").get_attribute("innerText")
    first_score.append(first)
    second = res.find_element_by_xpath(".//td[5]//div[@class=' no-wrap']/div[3]").get_attribute("innerText")
    second_score.append(second)
first_list = list(zip(first_score, second_score))
second_list = list(zip(second_score, first_score))
print(first_list)
print(second_list)

结果是两个元组列表

[('0', '4'), ('3', '4'), ('2', '2'), ('0', '1'), ('3', '0'), ('2', '2'), ('0', '4'), ('1', '0'), ('2', '1'), ('1', '1'), ('1', '2'), ('2', '4')]
[('4', '0'), ('4', '3'), ('2', '2'), ('1', '0'), ('0', '3'), ('2', '2'), ('4', '0'), ('0', '1'), ('1', '2'), ('1', '1'), ('2', '1'), ('4', '2')]

有更有效的方法，但我建议单独问一个问题

相关问题更多 >

编程相关推荐

热门问题

热门文章