将Beautifulsoup添加到Zenrows
我刚开始学编程,目前只是照着网上找到的解决方案在做。
在抓取这个网站的时候,我遇到了一个挑战。这个网站的链接是:https://www.oneroof.co.nz/property/auckland/bucklands-beach/18-camwell-close/H3rQw
我想获取Oneroof的估价信息(类名是"class='text-xl md:hidden text-secondary font-bold'")。我觉得这个网站有防止网页抓取的保护措施,所以我找到了Zenrows来绕过这些限制。
我成功获取到了HTML代码,但我不知道怎么才能得到我想要的结果。
我尝试把Zenrows和BeautifulSoup结合起来,但不确定这样做是否正确。
# pip install zenrows
from zenrows import ZenRowsClient
client = ZenRowsClient("daf1d79772a751a7d680055ed81a0d5b47cb2bb6")
url = "https://www.oneroof.co.nz/property/auckland/bucklands-beach/18-camwell-close/H3rQw"
params = {"js_render":"true","json_response":"true","js_instructions":"%5B%7B%22click%22%3A%22.selector%22%7D%2C%7B%22wait%22%3A500%7D%2C%7B%22fill%22%3A%5B%22.input%22%2C%22value%22%5D%7D%2C%7B%22wait_for%22%3A%22.slow_selector%22%7D%5D"}
response = client.get(url, params=params)
import requests
from bs4 import BeautifulSoup
soup = BeautifulSoup(response.text, "html.parser")
mydivs = str(soup.find(class_="text-xl md:hidden text-secondary font-bold"))
print(mydivs)
但是这样返回的结果是“None”。
1 个回答
1
你可以把BeautifulSoup的响应转换成HTML的DOM结构,这样就可以使用xpath来获取你想要的值。下面是相关的代码片段。
# pip install zenrows
from zenrows import ZenRowsClient
client = ZenRowsClient("daf1d79772a751a7d680055ed81a0d5b47cb2bb6")
url = "https://www.oneroof.co.nz/property/auckland/bucklands-beach/18-camwell-close/H3rQw"
params = {"js_render":"true","json_response":"true","js_instructions":"%5B%7B%22click%22%3A%22.selector%22%7D%2C%7B%22wait%22%3A500%7D%2C%7B%22fill%22%3A%5B%22.input%22%2C%22value%22%5D%7D%2C%7B%22wait_for%22%3A%22.slow_selector%22%7D%5D"}
response = client.get(url, params=params)
import requests
import bs4
from bs4 import BeautifulSoup
from lxml import etree
bs = bs4.BeautifulSoup(response.text, 'html.parser')
# converting BeautifulSoup response to HTML DOM
dom = etree.HTML(str(bs))
# To get complete value ($1,750,000) use xpath1 and xpath2 for short number ($1.75M)
# xpath1 = "//span[contains(text(),'OneRoof estimate')]/following-sibling::span[1]"
xpath2 = "//span[contains(text(),'OneRoof estimate')]/following-sibling::span[2]"
nodes = dom.xpath(xpath2)
estimation = nodes[0].text.strip()
print(estimation)
输出结果: $1.75M