使用无限滚动Python添加刮取条件时出现问题

2024-06-16 10:41:25 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在抓取一个带有无限卷轴的网站,由于我的无限卷轴使用selenium可以很好地工作,当我添加条件时,它只记录数据直到第一个卷轴 可能的问题是什么

(完全相同的滚动,不带条件)

我的代码:

last_height = driver.execute_script("return document.body.scrollHeight")

while True:

    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    # Wait to load page
    time.sleep(randint(1,10))
    for a in page.find_all('a', href=True): <--Condition
        print("Found the URL:", a['href'])  <----Condition


    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

Tags: totruenewexecutereturndriverpagescript
1条回答
网友
1楼 · 发布于 2024-06-16 10:41:25
def scroll(driver, timeout):
    scroll_pause_time = timeout

    # Get scroll height
    last_height = driver.execute_script("return document.body.scrollHeight")

    while True:
        # Scroll down to bottom
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

        # Wait to load page
        time.sleep(scroll_pause_time)

        # Calculate new scroll height and compare with last scroll height
        new_height = driver.execute_script("return document.body.scrollHeight")
        if new_height == last_height:
            # If heights are the same it will exit the function
            break
        last_height = new_height

scroll(driver, randint(2,5))
page = BeautifulSoup(driver.page_source, 'lxml')
count = 0
for a in page.find_all('a', href=True):
    count+=1
    print(count)
    print("Found the URL:", a['href'])

相关问题 更多 >