如果数据不在div中,beautiful soup find_all将跳过类索引

2024-05-29 03:19:58 发布

您现在位置:Python中文网/ 问答频道 /正文

我正试图从网站上刮取数据 为了了解我的问题,这里有一些示例

第一次迭代

<span class="lot-details-desc right">$7,344 USD
                        </span>
<span class="lot-details-desc right">Automatic
                        </span>
<span class="lot-details-desc right">Mercedes
                        </span>

第二次迭代

<span class="lot-details-desc right">$6000 USD
                        </span>
<span class="lot-details-desc right">     #NO DATA HERE
                        </span>
<span class="lot-details-desc right">Mercedes
                        </span>

#在一个循环中 用漂亮的汤

price = soup.find_all("span", {"class": "lot-details-desc right"})[0].get_text()           
  print(price)    
  trans = soup.find_all("span", {"class": "lot-details-desc right"})[1].get_text()           
  print(trans)
  name = soup.find_all("span", {"class": "lot-details-desc right"})[2].get_text()     
  print(trans)

我得到了结果

1st iteration
price=$7,344 USD
trans=Automatic
name=Mercedes 
     

2nd iteration
price=$6000 USD
trans=Mercedes
name=ERRORRR( out of bound cuz this one find_all indicates only 0 and 1 index instead of 0 1 2)

如有任何建议,将不胜感激


Tags: textrighttransgetallfinddetailsprice
1条回答
网友
1楼 · 发布于 2024-05-29 03:19:58

该站点上的数据通过JavaScript动态加载。您可以使用requests模块直接从其API获取数据:

import re
import json
import requests


url = 'https://www.copart.com/lot/25831510/'
data_url = 'https://www.copart.com/public/data/lotdetails/solr/{lot_id}'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}

lot_id = re.search(r'lot/(\d+)', url).group(1)


with requests.session() as s:
    s.get(url, headers=headers).text # load cookies
    data = s.get(data_url.format(lot_id=lot_id), headers=headers).json()

    # ucomment this to see all data:
    # print(json.dumps(data, indent=4))

    name = data['data']['lotDetails']['mkn']
    trans = data['data']['lotDetails']['tsmn']
    price = data['data']['lotDetails']['la']

    print('Name={} Trans={} Price={}'.format(name, trans, price))

印刷品:

Name=TOYOTA Trans=AUTOMATIC Price=7344.0

相关问题 更多 >

    热门问题