如何使用python beautifulsoup从网站中提取隐藏评论?

2024-06-07 01:05:35 发布

您现在位置:Python中文网/ 问答频道 /正文

我想提取所有审查细节,如名称、日期和审查数据等,用于产品blueair的以下网站。 https://www.costco.com/blueair-healthprotect-7410i-hepasilent-ultra-air-purifier-with-germshield.product.100750915.html 看起来它是隐藏的并且使用了javascript

import requests
from bs4 import BeautifulSoup
from requests_html import HTMLSession

headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '\
           'AppleWebKit/537.36 (KHTML, like Gecko) '\
           'Chrome/75.0.3770.80 Safari/537.36'}
URL = 'https://www.costco.com/blueair-healthprotect-7410i-hepasilent-ultra-air-purifier-with-germshield.product.100750915.html'
httpx = requests.get(URL, headers=headers)
# print(httpx.text)
soup = BeautifulSoup(httpx.content,'html.parser')
for data in soup.findAll('span', class_='bv-content-datetime-stamp'):
    print(data)



Tags: httpsimportcomhtmlwwwairrequestsheaders
2条回答

这不是一个静态的网页,所以您需要做的是分析请求和响应。这里是一个简化的方法来处理您的案例

  • 假设您正在使用Chrome,请按F12打开devtools,转到名为Network的选项卡,然后刷新页面
  • 再次单击Network,然后按<C-f>打开搜索窗格,键入一条评论,如“我购买了一架Blueair”,然后按<Enter>
  • 然后,您可以在结果中捕获batch.json,检查它,您将知道在哪里可以获得所需的数据

No

接下来可以做的是分析请求URL和api参数,然后尝试将请求发送到该api。如果一切顺利,您应该能够得到相同的batch.json

使用具有限制的API获取所有评论

import requests
import json

limit = 100
r = requests.get(f'https://api.bazaarvoice.com/data/batch.json?passkey=bai25xto36hkl5erybga10t99&apiversion=5.5&displaycode=2070_2_0-en_us&resource.q0=reviews&filter.q0=isratingsonly%3Aeq%3Afalse&filter.q0=productid%3Aeq%3A100750915&filter.q0=contentlocale%3Aeq%3Aen_CA%2Cen_US&sort.q0=relevancy%3Aa1&stats.q0=reviews&filteredstats.q0=reviews&include.q0=authors%2Cproducts%2Ccomments&filter_reviews.q0=contentlocale%3Aeq%3Aen_CA%2Cen_US&filter_reviewcomments.q0=contentlocale%3Aeq%3Aen_CA%2Cen_US&filter_comments.q0=contentlocale%3Aeq%3Aen_CA%2Cen_US&limit.q0={limit}&offset.q0=8&limit_comments.q0=3&callback=bv_351_53884')

comments = json.loads(r.text[13:-1])['BatchedResults']['q0']['Results']

print(comments[0]['ReviewText'])
This was a great purchase, beautifully packed, easy set-up, great app, sleek design, and very quiet.  The color bar shows the air quality being processed by the unit.

While I signed up for the auto-ship subscription service based on the Bluair statement that the app will analyze the filter condition and send a new filter right when it's needed.  However, after speaking with two Bluair employees, it seems that rather sending a new filter when needed, Bluair just sends a new filter every six months regardless of filter condition and use   certainly not high tech!

这些是您可以调整的查询参数

passkey: bai25xto36hkl5erybga10t99
apiversion: 5.5
displaycode: 2070_2_0-en_us
resource.q0: reviews
filter.q0: isratingsonly:eq:false
filter.q0: productid:eq:100750915
filter.q0: contentlocale:eq:en_CA,en_US
sort.q0: relevancy:a1
stats.q0: reviews
filteredstats.q0: reviews
include.q0: authors,products,comments
filter_reviews.q0: contentlocale:eq:en_CA,en_US
filter_reviewcomments.q0: contentlocale:eq:en_CA,en_US
filter_comments.q0: contentlocale:eq:en_CA,en_US
limit.q0: 30
offset.q0: 38
limit_comments.q0: 3
callback: bv_351_54703

相关问题 更多 >