在Python中将具有重复键的嵌套JSON文件转换为dataframe

{ "locations" : [ { "timestampMs" : "1549913792265", "latitudeE7" : 323518421, "longitudeE7" : -546166813, "accuracy" : 13, "altitude" : 1, "verticalAccuracy" : 2, "activity" : [ { "timestampMs" : "1549913286057", "activity" : [ { "type" : "STILL", "confidence" : 100 } ] }, { "timestampMs" : "1549913730454", "activity" : [ { "type" : "DRIVING", "confidence" : 100 } ] } ] }, { "timestampMs" : "1549912693813", "latitudeE7" : 323518421, "longitudeE7" : -546166813, "accuracy" : 13, "altitude" : 1, "verticalAccuracy" : 2, "activity" : [ { "timestampMs" : "1549911547308", "activity" : [ { "type" : "ACTIVE", "confidence" : 100 } ] }, { "timestampMs" : "1549912330473", "activity" : [ { "type" : "BIKING", "confidence" : 100 } ] } ] } ] }

1条回答

网友

1楼 · 发布于 2024-04-25 20:37:31

下面是一个使用json_normalize（documentation）的解决方案，假设您发布的JSON片段位于名为d的python字典中。在

from pandas.io.json import json_normalize

# Build a list of paths to JSON fields that will end up as metadata
# in the final DataFrame
meta = list(js['locations'][0].keys())

# meta is now this:
# ['timestampMs',
# 'latitudeE7',
# 'longitudeE7',
# 'accuracy',
# 'altitude',
# 'verticalAccuracy',
# 'activity']

# Almost correct. We need to remove 'activity' and append
# the list ['activity', 'timestampMs'] to meta.
meta.remove('activity')
meta.append(['activity', 'timestampMs'])

# meta is now this:
# ['timestampMs',
# 'latitudeE7',
# 'longitudeE7',
# 'accuracy',
# 'altitude',
# 'verticalAccuracy',
# ['activity', 'timestampMs']]

# Use json_normalize on the list of dicts
# that lives at d['locations'], passing in
# the appropriate record path and metadata
# paths, and specifying the double 'activity_'
# record prefix.
json_normalize(d['locations'], 
               record_path=['activity', 'activity'], 
               meta=meta,
               record_prefix='activity_activity_')

   activity_activity_confidence activity_activity_type    timestampMs  latitudeE7  longitudeE7  accuracy  altitude  verticalAccuracy activity.timestampMs
0                           100                  STILL  1549913792265   323518421   -546166813        13         1                 2        1549913286057
1                           100                DRIVING  1549913792265   323518421   -546166813        13         1                 2        1549913730454
2                           100                 ACTIVE  1549912693813   323518421   -546166813        13         1                 2        1549911547308
3                           100                 BIKING  1549912693813   323518421   -546166813        13         1                 2        1549912330473

编辑

如果['activity', 'activity']记录路径有时丢失，上面的代码将抛出一个错误。以下解决方法应适用于此特定情况，但它很脆弱，而且速度可能慢得令人无法接受，具体取决于输入数据的大小：

^{pr2}$

编辑

相关问题更多 >

编程相关推荐

热门问题

热门文章