如何使用python beautifulsoup从网站中提取隐藏评论？

import requests from bs4 import BeautifulSoup from requests_html import HTMLSession headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '\ 'AppleWebKit/537.36 (KHTML, like Gecko) '\ 'Chrome/75.0.3770.80 Safari/537.36'} URL = 'https://www.costco.com/blueair-healthprotect-7410i-hepasilent-ultra-air-purifier-with-germshield.product.100750915.html' httpx = requests.get(URL, headers=headers) # print(httpx.text) soup = BeautifulSoup(httpx.content,'html.parser') for data in soup.findAll('span', class_='bv-content-datetime-stamp'): print(data)

2条回答

网友

1楼 · 编辑于 2024-06-07 01:05:35

这不是一个静态的网页，所以您需要做的是分析请求和响应。这里是一个简化的方法来处理您的案例

假设您正在使用Chrome，请按F12打开devtools，转到名为Network的选项卡，然后刷新页面
再次单击Network，然后按<C-f>打开搜索窗格，键入一条评论，如“我购买了一架Blueair”，然后按<Enter>
然后，您可以在结果中捕获batch.json，检查它，您将知道在哪里可以获得所需的数据

接下来可以做的是分析请求URL和api参数，然后尝试将请求发送到该api。如果一切顺利，您应该能够得到相同的batch.json

网友

2楼 · 编辑于 2024-06-07 01:05:35

使用具有限制的API获取所有评论

import requests
import json

limit = 100
r = requests.get(f'https://api.bazaarvoice.com/data/batch.json?passkey=bai25xto36hkl5erybga10t99&apiversion=5.5&displaycode=2070_2_0-en_us&resource.q0=reviews&filter.q0=isratingsonly%3Aeq%3Afalse&filter.q0=productid%3Aeq%3A100750915&filter.q0=contentlocale%3Aeq%3Aen_CA%2Cen_US&sort.q0=relevancy%3Aa1&stats.q0=reviews&filteredstats.q0=reviews&include.q0=authors%2Cproducts%2Ccomments&filter_reviews.q0=contentlocale%3Aeq%3Aen_CA%2Cen_US&filter_reviewcomments.q0=contentlocale%3Aeq%3Aen_CA%2Cen_US&filter_comments.q0=contentlocale%3Aeq%3Aen_CA%2Cen_US&limit.q0={limit}&offset.q0=8&limit_comments.q0=3&callback=bv_351_53884')

comments = json.loads(r.text[13:-1])['BatchedResults']['q0']['Results']

print(comments[0]['ReviewText'])

This was a great purchase, beautifully packed, easy set-up, great app, sleek design, and very quiet.  The color bar shows the air quality being processed by the unit.

While I signed up for the auto-ship subscription service based on the Bluair statement that the app will analyze the filter condition and send a new filter right when it's needed.  However, after speaking with two Bluair employees, it seems that rather sending a new filter when needed, Bluair just sends a new filter every six months regardless of filter condition and use   certainly not high tech!

这些是您可以调整的查询参数

passkey: bai25xto36hkl5erybga10t99
apiversion: 5.5
displaycode: 2070_2_0-en_us
resource.q0: reviews
filter.q0: isratingsonly:eq:false
filter.q0: productid:eq:100750915
filter.q0: contentlocale:eq:en_CA,en_US
sort.q0: relevancy:a1
stats.q0: reviews
filteredstats.q0: reviews
include.q0: authors,products,comments
filter_reviews.q0: contentlocale:eq:en_CA,en_US
filter_reviewcomments.q0: contentlocale:eq:en_CA,en_US
filter_comments.q0: contentlocale:eq:en_CA,en_US
limit.q0: 30
offset.q0: 38
limit_comments.q0: 3
callback: bv_351_54703

相关问题更多 >

编程相关推荐

热门问题

热门文章