我想抓取一个名称,但输出为NONE

0 投票
1 回答
44 浏览
提问于 2025-04-14 16:21

检查元素价格的图片

我在抓取一个网站,想提取商品的名称和价格,但输出的结果却是NONE。我完全不知道自己哪里出错了,因为我本来期待能得到价格和名称的。

网站名称:https://www.lordgun.com/specialized-turbo-kenevo-comp-bike-emtb-1?color=Gloss%20Dark%20Moss%20Green%20Oak%20Green

我尝试了以下代码:

#Packages needed
from bs4 import BeautifulSoup
import requests
import pandas as pd

#Base website URL
baseurl = "https://www.lordgun.com/"

#New list to store all the links of the different bikes that will later be scraped
productlinks =[]

for x in range(1,2):
  ##Website waar het van af gehaald moet worden, de links voor de verdere scraping
  r = requests.get(f'https://www.lordgun.com/_road-mtb-bikes-frames?p={x}')
  soup = BeautifulSoup(r.content, 'lxml')

  ##Searching for all the products on the webpage, these are currently defined as an "article" when inspected but could also be "div"
  productlist = soup.find_all('article', class_="article product")

  ##To test if the script finds the different products
  #print(productlist)

  for link in productlist:
    for link in link.find_all('a', href=True):
      ## possbile to test if the links are being found that needed to be found in order to later get the details
      #print(link['href'])
      productlinks.append(baseurl + link['href'])

## To test how many links are being created
print(len(productlinks))

## to test 1 single link before the loop is generated so that there will not be a long waiting time before realising there are mistakes in this case it will be
testlink = 'https://www.lordgun.com/specialized-turbo-kenevo-comp-bike-emtb-1?color=Gloss%20Dark%20Moss%20Green%20Oak%20Green'
r = requests.get(testlink)

soup = BeautifulSoup(r.content, 'lxml')

name = soup.find('h1', name_='product title')
price = soup.find('div', class_= 'prd-price')
category = soup.find('span', itemprop_='name')

print(price)
print(name)
print(category)

1 个回答

0

对于网站中的第一个“h1”标签,你可以使用这段代码。很多网站,尤其是只有一页的网站,通常只会有一个“h1”标签,这是为了优化搜索引擎的效果。

product_title = soup.find('h1').text.strip() 
print(product_title)

撰写回答