Selenium：获取滚动加载的数据

4 投票

2 回答

5910 浏览

提问于 2025-04-17 14:03

我正在尝试获取一个页面中的元素，这个页面有像Twitter那样的滚动加载功能，也就是向下滚动时会自动加载更多内容。不过，不知道为什么这个功能没有正常工作。我加了一些打印语句来调试，但我总是得到相同数量的项目，然后函数就返回了。我到底哪里做错了呢？

wd = webdriver.Firefox()
wd.implicitly_wait(3)

def get_items(items):
    print len(items)
    wd.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    # len(items) and len(wd.find_elements-by...()) both always seem to return the same number
    # if I were to start the loop with while True: it would work, but of course... never end
    while len(wd.find_elements_by_class_name('stream-item')) > len(items):
        items = wd.find_elements_by_class_name('stream-item')
        print items
        wd.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    return items

def test():
    get_page('http://twitter.com/')
    get_items(wd.find_elements_by_class_name('stream-item'))

网页抓取 selenium 元素定位滚动加载自动加载

2 个回答

在我的情况中，while循环里的条件出了问题。它变成了一个无限循环。我通过使用一个计数器来解决这个问题：

def get_items(items):

    item_nb = [0, 1] # initializing a counter of number of items found in page

    while(item_nb[-1] > item_nb[-2]):   # exiting the loop when no more new items can be found in the page

        items = wd.find_elements_by_class_name('stream-item')
        time.sleep(5)
        browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")

        item_nb.append(len(items))

    return items

```

回答于 2025-04-17 由 Python大师

分享举报

试着在中间加一个暂停

wd = webdriver.Firefox()
wd.implicitly_wait(3)

def get_items(items):
    print len(items)
    wd.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    # len(items) and len(wd.find_elements-by...()) both always seem to return the same number
    # if I were to start the loop with while True: it would work, but of course... never end

    sleep(5) #seconds
    while len(wd.find_elements_by_class_name('stream-item')) > len(items):
        items = wd.find_elements_by_class_name('stream-item')
        print items
        wd.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    return items

def test():
    get_page('http://twitter.com/')
    get_items(wd.find_elements_by_class_name('stream-item'))

注意：这里的硬性暂停只是为了演示效果。请使用 waits 包来等待一个更智能的条件。

回答于 2025-04-17 由 Python大师

分享举报

Selenium：获取滚动加载的数据

2 个回答

撰写回答