使用BeautifulSoup进行抓取时，输出带有正确标记的None

from bs4 import BeautifulSoup import requests def make_soup(url): html = requests.get(url) bsObj = BeautifulSoup(html.text, 'html.parser') return bsObj soup = make_soup('https://www.zalora.com.hk/men/clothing/shirt/?gender=men&dir=desc&sort=popularity&category_id=31&enable_visual_sort=1') itemBrand = soup.find("span",{"class":"b-catalogList__itmBrand fsm txtDark uc js-catalogProductTitle"}) itemName = soup.find("em",{"class":"b-catalogList__itmTitle fss"}) itemPrice = soup.find("span",{"class":"b-catalogList__itmPrice old"}) print(itemBrand, itemName, itemPrice)

1条回答

网友

1楼 · 发布于 2024-06-02 04:24:04

由于此内容是由JavaScript呈现的，因此不能使用requests模块访问它。您应该使用selenium使浏览器自动化，然后使用BeautifulSoup来解析实际的{}。在

这是使用selenium和{a1}一起使用的方法：

from selenium import webdriver
from bs4 import BeautifulSoup

chrome_driver = "path\\to\\chromedriver.exe"
driver = webdriver.Chrome(executable_path=chrome_driver)

target = 'https://www.zalora.com.hk/men/clothing/shirt/?gender=men&dir=desc&sort=popularity&category_id=31&enable_visual_sort=1'
driver.get(target)

soup = BeautifulSoup(driver.page_source, "lxml")

print(soup.find("span",{"class":"b-catalogList__itmBrand fsm txtDark uc js-catalogProductTitle"}).get_text().strip())
print(soup.find("span", {'class': 'b-catalogList__itmPrice old'}).get_text().strip())
print(soup.find("em",{"class":"b-catalogList__itmTitle fss"}).get_text().strip())

输出：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章