我想抓取一个名称,但输出为NONE
我在抓取一个网站,想提取商品的名称和价格,但输出的结果却是NONE。我完全不知道自己哪里出错了,因为我本来期待能得到价格和名称的。
我尝试了以下代码:
#Packages needed
from bs4 import BeautifulSoup
import requests
import pandas as pd
#Base website URL
baseurl = "https://www.lordgun.com/"
#New list to store all the links of the different bikes that will later be scraped
productlinks =[]
for x in range(1,2):
##Website waar het van af gehaald moet worden, de links voor de verdere scraping
r = requests.get(f'https://www.lordgun.com/_road-mtb-bikes-frames?p={x}')
soup = BeautifulSoup(r.content, 'lxml')
##Searching for all the products on the webpage, these are currently defined as an "article" when inspected but could also be "div"
productlist = soup.find_all('article', class_="article product")
##To test if the script finds the different products
#print(productlist)
for link in productlist:
for link in link.find_all('a', href=True):
## possbile to test if the links are being found that needed to be found in order to later get the details
#print(link['href'])
productlinks.append(baseurl + link['href'])
## To test how many links are being created
print(len(productlinks))
## to test 1 single link before the loop is generated so that there will not be a long waiting time before realising there are mistakes in this case it will be
testlink = 'https://www.lordgun.com/specialized-turbo-kenevo-comp-bike-emtb-1?color=Gloss%20Dark%20Moss%20Green%20Oak%20Green'
r = requests.get(testlink)
soup = BeautifulSoup(r.content, 'lxml')
name = soup.find('h1', name_='product title')
price = soup.find('div', class_= 'prd-price')
category = soup.find('span', itemprop_='name')
print(price)
print(name)
print(category)
1 个回答
0
对于网站中的第一个“h1”标签,你可以使用这段代码。很多网站,尤其是只有一页的网站,通常只会有一个“h1”标签,这是为了优化搜索引擎的效果。
product_title = soup.find('h1').text.strip()
print(product_title)