我在使用beautifulsoup进行网页抓取时遇到一些问题

2024-04-23 07:22:41 发布

您现在位置:Python中文网/ 问答频道 /正文

当我尝试使用.text()提取标记之间的文本时,它会显示一个空白屏幕,输出为[]

import requests
from bs4 import BeautifulSoup

page = requests.get("https://www.amazon.in/s?k=ssd&ref=nb_sb_noss")

soup = BeautifulSoup(page.content, "html.parser")

product = soup.find_all("h2",class_="a-link-normal a-text-normal")
results = soup.find_all("span",class_="a-offscreen")

print(product)

这是我得到的输出:

C:\Users\Kushal\Desktop\requests-tutorial>C:/Users/Kushal/AppData/Local/Programs/Python/Python37/python.exe c:/Users/Kushal/Desktop/requests-tutorial/scraper.py
[]

当我尝试用for循环列出所有内容时,没有显示任何内容,即使是空的方括号


Tags: textimportpageallfindproductrequestsusers
1条回答
网友
1楼 · 发布于 2024-04-23 07:22:41

根据你下面的评论。我已经修改了代码,以获取上述页面上的所有产品标题以及价格详细信息

如果答案有效,则将其标记为答案,否则评论以供进一步分析

import requests
from bs4 import BeautifulSoup
import lxml


dataList = list()
headers = {
    "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5)",
    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "accept-charset": "cp1254,ISO-8859-9,utf-8;q=0.7,*;q=0.3",
    "accept-encoding": "gzip,deflate,sdch",
    "accept-language": "tr,tr-TR,en-US,en;q=0.8",
} 

url = requests.get('https://www.amazon.in/s?k=ssd&ref=nb_sb_noss'.format(), headers=headers)

soup = BeautifulSoup(url.content, 'lxml')

title = soup.find_all('span', attrs={'class':'a-size-medium a-color-base a-text-normal'})
price = soup.find_all('span', attrs={'class':'a-offscreen'})


for product in zip(title,price):
    title,price=product
    title_proper=title.text.strip()
    price_proper=price.text.strip()
    print(title_proper,'-',price_proper)
    
         

相关问题 更多 >