爬取链接表格的网络

2024-04-20 15:33:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在用python创建一个脚本,它遍历一个包含三列的表。我创建了一个列表,第一列中的每个链接都插入到列表中。然后我循环。循环时,我单击链接,打印一条语句以确保它确实单击到链接中,然后转到上一页,以便可以单击下一个链接。我一直得到的错误是,我的循环首先遍历前两个链接,然后当循环第三次调用links[page].click()时,我得到一个StaleElementReferenceException。我不能发布html,因为网站是保密的。在

    from selenium import webdriver
    from selenium.webdriver.common.keys import Keys
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import Select
    import traceback


    # starting chrome browser
    chrome_path = r"C:\Users\guaddavi\Downloads\chromedriver_win32    extract\chromedriver.exe"
    browser = webdriver.Chrome(chrome_path)


    #linking to page
    browser.get('link to page with table ')


    #find table of ETL Extracts
    table_id = browser.find_element_by_id('sortable_table_id_0')
    #print('found table')

    #get all the rows of the table containing the links
    rows = table_id.find_elements_by_tag_name('tr')

    #remove the first row that has the header
    del rows[0]
    current = 0
    links = [] * len(rows)

    for row in rows:
     col = row.find_elements_by_tag_name('td')[0]
     links.append(col)
     current +=1

    page = 0
    while(page <= len(rows)):
        links[page].click()
        print('clicked link' + "  " + str(page))
        page += 1
        browser.back()     

Tags: thefromimportbrowseridby链接selenium
1条回答
网友
1楼 · 发布于 2024-04-20 15:33:03

我不确定您是否已经看过硒的官方文档:

A stale element reference exception is thrown in one of two cases, the first being more common than the second: The element has been deleted entirely. The element is no longer attached to the DOM.

就你而言,我认为你有第二个问题。每次单击并返回循环时,DOM都在更改。请检查一下。在

相关问题 更多 >