如何从窗口使用BeautifulSoup获取此JSON值。\uuu初始\u状态__

2024-05-15 08:50:01 发布

您现在位置:Python中文网/ 问答频道 /正文

因此,我正在抓取一个包含window.初始状态的网站,它被分配了一个巨大的JSON字符串。我正在寻找股票信息(该商品目前缺货),在JSON网格中如下所示:

First level tree

second level tree

{
    "slotType": "WIDGET",
    "id": 11,
    "parentId": 10002,
    "layoutParams": {
      "margin": "0,24,0,0",
      "orientation": "",
      "widgetHeight": 150,
      "widgetWidth": 12
    },
    "dataId": "1230886539",
    "elementId": "11-AVAILABILITY",
    "hasWidgetDataChanged": true,
    "ttl": 3000,
    "widget": {
      "type": "AVAILABILITY",
      "viewType": "brand",
      "data": {
        "announcementComponent": {
          "action": null,
          "metaData": null,
          "tracking": null,
          "trackingData": null,
          "value": {
            "type": "AnnouncementValue",
            "subTitle": "This item is currently out of stock",
            "title": "Sold Out"
          }
        }
      }
    }
  },

我尝试了以下方式,但不起作用:

soup = BeautifulSoup(page.content, features="lxml")
print(soup.find(elementID='11-AVAILABILITY').get_text().strip())

Tags: 字符串信息json网格网站typewidgetwindow
1条回答
网友
1楼 · 发布于 2024-05-15 08:50:01

要从HTML中解析__INITIAL_STATE__,可以使用以下示例:

import re
import json
import requests


url = 'https://www.flipkart.com/sony-310ap-wired-headset/p/itm0527f8b27c68f'
html_data = requests.get(url).text

data = re.search(r'window\.__INITIAL_STATE__ = ({.*});', html_data).group(1)
data = json.loads(data)

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

for w in data['pageDataV4']['page']['data']['10002']:
    if w.get("elementId") == "11-AVAILABILITY":
        print(json.dumps(w, indent=4))
        break

印刷品:

{
    "slotType": "WIDGET",
    "id": 11,
    "parentId": 10002,
    "layoutParams": {
        "margin": "0,24,0,0",
        "orientation": "",
        "widgetHeight": 150,
        "widgetWidth": 12
    },
    "dataId": "1230886539",
    "elementId": "11-AVAILABILITY",
    "hasWidgetDataChanged": true,
    "ttl": 3000,
    "widget": {
        "type": "AVAILABILITY",
        "viewType": "brand",
        "data": {
            "announcementComponent": {
                "action": null,
                "metaData": null,
                "tracking": null,
                "trackingData": null,
                "value": {
                    "type": "AnnouncementValue",
                    "subTitle": "This item is currently out of stock",
                    "title": "Sold Out"
                }
            }
        }
    }
}

相关问题 更多 >