清理刮除结果以返回锚文本，但不返回HTML

import requests from pandas.io.json import json_normalize from bs4 import BeautifulSoup url = 'https://www.prohockeylife.com/collections/senior-hockey-sticks' headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'} page = requests.get(url) soup = BeautifulSoup(page.text, 'html.parser') stick_names = soup.find_all(class_='product-title') stick_prices = soup.find_all(class_='regular-product') print(stick_prices)

2条回答

网友

1楼 · 编辑于 2024-06-16 10:44:42

您需要.text属性，您也可以在列表理解期间提取该属性。然后list/zip在末尾显示名称/价格的元组列表

import requests
from bs4 import BeautifulSoup

url = 'https://www.prohockeylife.com/collections/senior-hockey-sticks'
headers = {'user-agent': 'Mozilla/5.0'}   
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
stick_names = [item.text.strip() for item in soup.find_all(class_='product-title')]
stick_prices = [item.text.strip() for item in soup.find_all(class_='regular-product')]
print(list(zip(stick_names, stick_prices)))

网友

2楼 · 编辑于 2024-06-16 10:44:42

不确定，但我认为以下是您可能需要的：

不要使用print(stick_prices)，而是使用：

for name,price in zip(stick_names,stick_prices):   
       print(name["href"],name.text,price.text)

输出的开始是：

    /collections/senior-hockey-sticks/products/ccm-ribcor-trigger-3d-sr-hockey-stick 

        CCM RIBCOR TRIGGER 3D SR HOCKEY STICK     

$319.99

/collections/senior-hockey-sticks/products/bauer-vapor-1x-lite-sr-hockey-stick 

        BAUER VAPOR 1X LITE SR HOCKEY STICK


$339.99

等等

相关问题更多 >

编程相关推荐

热门问题

热门文章