Python BeautifulSoup问题与提取

2024-05-12 17:47:02 发布

您现在位置:Python中文网/ 问答频道 /正文

我想提取一些数据并放在excel中

我的问题是提取,我不采取所有的信息比我看到的元素检查。通过元素检查,我看到了每个元素、品牌、公里数、价格等。。。所有这些信息都在我的摘录中,但都是脚本,和我在网站上看到的不一样

摘录的URL:https://www.alcopa-auction.fr/salle-de-vente-encheres/nancy/2110

import requests
from bs4 import BeautifulSoup

URL = 'https://www.alcopa-auction.fr/salle-de-vente-encheres/nancy/2110'
page = requests.get(URL)

headers = {"User-Agent": 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15'}

page = requests.get(URL, headers=headers)

soup = BeautifulSoup(page.content, 'html.parser')

print(soup)

Tags: https信息url元素wwwpagedefr
1条回答
网友
1楼 · 发布于 2024-05-12 17:47:02

您在页面上看到的数据以JSON格式嵌入页面中。例如,您可以这样提取它:

import re
import json
import requests

url = 'https://www.alcopa-auction.fr/salle-de-vente-encheres/nancy/2110'

txt = requests.get(url).text

json_string = re.search(r'window.Alcopa.searchResultsJSONString = \'(.*)\';', txt)[1]

data = json.loads( json.loads('"{}"'.format(json_string)) )

# print(json.dumps(data, indent=4))  # <  uncomment this to see all data

print('{:<20} {:<70} {:<10} {:<10} {:<10}'.format('Brand', 'Model', 'Price', 'Sale Date', 'Sale End Date'))
for car in data['car']:
    print('{:<20} {:<70} {:<10} {:<10} {:<10}'.format(car['brand'], car['detailed_model'], car['price'], car['sale_date'], car['sale_end_date']))

印刷品:

Brand                Model                                              Price      Sale Date  Sale End Date
RENAULT              TRAFIC L2H1 1200 1.9 DCI 80 PACK CLIM              2 900 €    22/01/2020 22/01/2020
FIAT                 DUCATO COMBI 3.3 M H2 2.2 MULTIJET                 7 000 €    22/01/2020 22/01/2020
CITROEN              C3 HDI 70 CONFORT                                  3 800 €    22/01/2020 22/01/2020
DS                   DS3 HDI 90 FAP AIRDREAM SO CHIC                    4 000 €    22/01/2020 22/01/2020
VOLKSWAGEN           POLO 1.6 TDI 90 CR FAP CONFORTLINE                 3 200 €    22/01/2020 22/01/2020
PEUGEOT              207 1.6 HDI 90CH BLUE LION ACTIVE                  3 100 €    22/01/2020 22/01/2020
FIAT                 PANDA MY 1.2 8V 69 CH TEAM                         1 600 €    22/01/2020 22/01/2020
FORD                 KUGA 2.0 TDCI 140 DPF 4X4 TITANIUM POWERSHIFT A    5 400 €    22/01/2020 22/01/2020

... and so on.

相关问题 更多 >