Python:从json提取第二级到数据帧

2024-05-16 14:43:39 发布

您现在位置:Python中文网/ 问答频道 /正文

我认为这种格式是简单的json,但我只需要“rows”字段作为数据帧:

{    "data":{
      "filters":null,
      "records":{
         "totalrecords":3031,
         "limit":3031,
         "offset":0,
         "data":{
            "headers":{
               "symbol":"SYMBOL",
               "companyName":"NAME",
               "lastSalePrice":"LAST PRICE",
               "percentageChange":"% CHANGE",
               "oneYearPercentagechange":"1 yr % CHANGE"
            },
            "rows":[
               {
                  "oneYearPercentage":"11.20%",
                  "symbol":"KSCD",
                  "companyName":"KFA Small Cap Quality Dividend Index ETF",
                  "lastSalePrice":"$27.9998",
                  "netChange":"+0.4788",
                  "percentageChange":"1.74%",
                  "deltaIndicator":"up"
               },
               {
                  "oneYearPercentage":"58.70%",
                  "symbol":"KURE",
                  "companyName":"KraneShares MSCI All China Health Care Index ETF",
                  "lastSalePrice":"$41.62",
                  "netChange":"+1.47",
                  "percentageChange":"3.66%",
                  "deltaIndicator":"up"
               }
            ]
         }
      }    } }

如何仅将“rows”字段解析为Pandas数据帧,我尝试了以下方法:

import requests
import pandas as pd
import json


with open('etf-data.json') as json_file:
    data = json.load(json_file)
print(data['data']['rows'])    
df = pd.DataFrame.from_dict(pd.json_normalize(data), orient='columns')

但我不确定要正确解析这里的字段需要多少层


Tags: 数据importjsondataindexsymbolchangerows
3条回答

对于这样的嵌套json,我建议使用jmespath;它可以用更干净的外观来抽象一些索引:

import jmespath
expression = jmespath.compile("data.records.data.rows")
# data is the variable name containing the json data
pd.DataFrame(expression.search(data))

  oneYearPercentage symbol                                       companyName lastSalePrice netChange percentageChange deltaIndicator
0            11.20%   KSCD          KFA Small Cap Quality Dividend Index ETF      $27.9998   +0.4788            1.74%             up
1            58.70%   KURE  KraneShares MSCI All China Health Care Index ETF        $41.62     +1.47            3.66%             up

编辑:我在这里再次看到JSON后就明白了。我的解决办法如下:

import requests
import pandas as pd
import json


with open('etf-data.json') as json_file:
    data = json.load(json_file)
print(data['data']['records']['data']['rows'])    
df = pd.DataFrame.from_dict(data['data']['records']['data']['rows'], orient='columns')
df.head()

其输出为:

^{tb1}$

尝试递归调用键搜索函数

def findrows(js, key):
         for k, v  in js.items():
             if k == key:
                 return v
             elif isinstance(v, dict):
                 found = findrows(v, key) 
                 if found is not None: 
                     return found
                 
pd.DataFrame(findrows(js,'rows'))
Out[179]: 
  oneYearPercentage symbol  ... percentageChange deltaIndicator
0            11.20%   KSCD  ...            1.74%             up
1            58.70%   KURE  ...            3.66%             up
[2 rows x 7 columns]

相关问题 更多 >