（新问题）Python BeautifulSoup如何在保持向下滚动的同时捕获文本？（网络爬虫）

while True: url ='https://xxxxxxxxx/{}'.format(pagenum) driver.get(url) pagesource = driver.page_source soup = BeautifulSoup(pagesource, 'lxml') if url == "https://xxxxxxxxxx/5": break else: for s in soup.find_all("div", class_="_2cNsJna0_hV8tdMj3X6_gJ"): for j in s: if "Sheeran" in j: # only search Sheeran is fine but if i change it to "Sheeran" or "concert", the result will be generated randomly print(s.text) pagenum+=1 time.sleep(2)

1条回答

网友

1楼 · 发布于 2024-04-25 14:13:47

另一种方法是找出网站在滚动时是如何获取内容的。你知道吗

您可以尝试在循环中增加页码。你知道吗

pagenum = 1
while True:
    url ='https://lihkg.com/thread/1082050/page/{}'.format(pagenum)
    driver.get(url)
    pagesource = driver.page_source
    soup = BeautifulSoup(pagesource, 'lxml')
    profile_links = soup.find('a', attrs={'href': re.compile('/profile'))
    if not profile_links:
        break
    pagenum+=1
    # page is valid, continue with code to extract results

或者使用出现在网络流量中的API url。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章