我正试图通过无限滚动来刮取playstore中的链接。使用Selenium和BeautifulSoup,我只能获取第一页上的链接。如何继续前进,以获得所有链接的完整列表。这是我到目前为止所拥有的
from selenium import webdriver
from bs4 import BeautifulSoup
from requests import Session
import re
urls = [
'https://play.google.com/store/apps/collection/cluster?clp=0g4eChwKFnRvcHNlbGxpbmdfZnJlZV9EQVRJTkcQBxgD:S:ANO1ljJ0qxs&gsr=CiHSDh4KHAoWdG9wc2VsbGluZ19mcmVlX0RBVElORxAHGAM%3D:S:ANO1ljJvMRw&hl=en&gl=US'
]
def main():
driver = webdriver.Chrome()
for url in urls:
driver.get(url)
content = driver.page_source.encode('utf-8').strip()
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
page = requests.get(url, headers=headers).content
bs = BeautifulSoup(page, "html.parser")
temp_link = []
first_p = bs.findAll('a', {'class':'poRVub'})
for link in first_p:
temp_link.append('https://play.google.com'+link['href'])
more_link = bs.findAll('div', {'class':'uMConb V2Vq5e POHYmb-eyJpod YEDFMc-eyJpod y1APZe-eyJpod drrice'})
print(more_link)
您需要实现一种滚动方式,直到google为您呈现所需的所有项目
然后在加载url后调用此方法
注意您将需要以下导入:
您还需要在驱动程序初始化后创建
wait
对象:输出
相关问题 更多 >
编程相关推荐