试图刮去Lazada的产品页面

2024-04-27 01:00:55 发布

您现在位置:Python中文网/ 问答频道 /正文

嗨,我正在尝试在lazada.com.ph中刮取一个产品,但是当我尝试加载链接时,它会给我一个错误,看起来像这样

We have detected unusual traffic from your network, please try again later.

我无法开始我的代码编写,因为这妨碍了我

from requests_html import HTMLSession
from bs4 import BeautifulSoup

def getData(url):
session = HTMLSession()
response = session.get(url)
response.html.render(sleep=1)
soup = BeautifulSoup(response.html.html, 'lxml')

print(soup.text)

lazada = 'https://www.lazada.com.ph/products/poco-x3-pro-8gb-ram-256gb-rom-android-smartphone-i1897541032-s8049001075.html?spm=a2o4l.home.flashSale.2.568f359dVjXEgp&search=1&mp=1&c=fs&clickTrackInfo=%7B%22rs%22%3A%220.07556916303173189%22%2C%22prior_score%22%3A%220%22%2C%22submission_discount%22%3A%2213%25%22%2C%22iss%22%3A%220.07556916303173189%22%2C%22type%22%3A%22entrance%22%2C%22prior_type%22%3A%22racing%22%2C%22userid%22%3A%22%22%2C%22sca%22%3A%22129%22%2C%22hourtonow%22%3A%2215%22%2C%22abid%22%3A%22178638%22%2C%22itemid%22%3A%221897541032_0_racing_0.07556916303173189_0.07556916303173189%22%2C%22pvid%22%3A%22fb0dc67d-d4da-4a00-8875-f372cd6be63a%22%2C%22pos%22%3A%220%22%2C%22rms%22%3A%220.0%22%2C%22c2i%22%3A%220.0%22%2C%22scm%22%3A%221007.17760.178638.%22%2C%22ss%22%3A%220.07556916303173189%22%2C%22i2i%22%3A%220.0%22%2C%22ms%22%3A%220.07556916303173189%22%2C%22itr%22%3A%220.13743589743589743%22%2C%22mt%22%3A%22racing%22%2C%22its%22%3A%221950%22%2C%22promotion_price%22%3A%2213990.00%22%2C%22anonid%22%3A%22dacd613a-5ef7-417e-81af-9cbca55c0971%22%2C%22FinalScore%22%3A%220.053083501756191254%22%2C%22isc%22%3A%22268%22%2C%22iss2%22%3A%220.5778039211824528%22%2C%22data_type%22%3A%22flashsale%22%2C%22iss1%22%3A%220.016712397106510353%22%2C%22config%22%3A%22%22%2C%22HP_score%22%3A%220.053083501756191254%22%2C%22channel_id%22%3A%220000%22%7D&scm=1007.17760.178638.0'
getData(lazada)

正如您所看到的,它只是一个简单的解析和打印 我正在使用库请求html来简化标题。 在我读到的某篇文章中,它说的是剧本,但我不知道该往哪里看,因为我对这个相当陌生。这就像是它阻止我去刮网站一样


Tags: fromimportcomurl产品链接responsesession
1条回答
网友
1楼 · 发布于 2024-04-27 01:00:55

拉扎达已经采取措施(在你的情况下成功地)防止人们刮它

你的目标不是向该网站提出简单的请求, 但要尽可能地模仿浏览器

因为你对这个话题还不熟悉,试着先找一个更容易的目标,比如你当地的报纸或其他什么

此外,互联网上总有人已经这么做了。幸运的是,一个简单的搜索就能为您带来这颗宝石:)

https://kenciceron45.wixsite.com/krontek/post/can-we-scrape-it-eps-3-lazada

相关问题 更多 >