在Python中将具有重复键的嵌套JSON文件转换为dataframe

2024-04-25 20:37:31 发布

您现在位置:Python中文网/ 问答频道 /正文

假设下面的JSON文件片段将在Python上展开。在

{
  "locations" : [ {
    "timestampMs" : "1549913792265",
    "latitudeE7" : 323518421,
    "longitudeE7" : -546166813,
    "accuracy" : 13,
    "altitude" : 1,
    "verticalAccuracy" : 2,
    "activity" : [ {
      "timestampMs" : "1549913286057",
      "activity" : [ {
        "type" : "STILL",
        "confidence" : 100
      } ]
    }, {
      "timestampMs" : "1549913730454",
      "activity" : [ {
        "type" : "DRIVING",
        "confidence" : 100
      } ]
    } ]
  }, {
    "timestampMs" : "1549912693813",
    "latitudeE7" : 323518421,
    "longitudeE7" : -546166813,
    "accuracy" : 13,
    "altitude" : 1,
    "verticalAccuracy" : 2,
    "activity" : [ {
      "timestampMs" : "1549911547308",
      "activity" : [ {
        "type" : "ACTIVE",
        "confidence" : 100
      } ]
    }, {
      "timestampMs" : "1549912330473",
      "activity" : [ {
        "type" : "BIKING",
        "confidence" : 100
      } ]
    } ]
  } ]
}

我们的目标是将其转换为一个扁平的数据帧,如下所示:

^{pr2}$

如果关键的“活动”在不同的嵌套级别上重复出现,我们该怎么做呢?在


Tags: 文件jsontypeactivityactiveconfidencealtitudelocations
1条回答
网友
1楼 · 发布于 2024-04-25 20:37:31

下面是一个使用json_normalizedocumentation)的解决方案,假设您发布的JSON片段位于名为d的python字典中。在

from pandas.io.json import json_normalize

# Build a list of paths to JSON fields that will end up as metadata
# in the final DataFrame
meta = list(js['locations'][0].keys())

# meta is now this:
# ['timestampMs',
# 'latitudeE7',
# 'longitudeE7',
# 'accuracy',
# 'altitude',
# 'verticalAccuracy',
# 'activity']

# Almost correct. We need to remove 'activity' and append
# the list ['activity', 'timestampMs'] to meta.
meta.remove('activity')
meta.append(['activity', 'timestampMs'])

# meta is now this:
# ['timestampMs',
# 'latitudeE7',
# 'longitudeE7',
# 'accuracy',
# 'altitude',
# 'verticalAccuracy',
# ['activity', 'timestampMs']]

# Use json_normalize on the list of dicts
# that lives at d['locations'], passing in
# the appropriate record path and metadata
# paths, and specifying the double 'activity_'
# record prefix.
json_normalize(d['locations'], 
               record_path=['activity', 'activity'], 
               meta=meta,
               record_prefix='activity_activity_')

   activity_activity_confidence activity_activity_type    timestampMs  latitudeE7  longitudeE7  accuracy  altitude  verticalAccuracy activity.timestampMs
0                           100                  STILL  1549913792265   323518421   -546166813        13         1                 2        1549913286057
1                           100                DRIVING  1549913792265   323518421   -546166813        13         1                 2        1549913730454
2                           100                 ACTIVE  1549912693813   323518421   -546166813        13         1                 2        1549911547308
3                           100                 BIKING  1549912693813   323518421   -546166813        13         1                 2        1549912330473

编辑

如果['activity', 'activity']记录路径有时丢失,上面的代码将抛出一个错误。以下解决方法应适用于此特定情况,但它很脆弱,而且速度可能慢得令人无法接受,具体取决于输入数据的大小:

^{pr2}$

相关问题 更多 >