如何将多个深度嵌套的JSON文件展平到一个数据帧中?

2024-04-29 11:37:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试扁平化嵌套很深的json文件

我有22个json文件,我想收集在一个数据帧中。我设法用json_normalize将它们展平到第二级,但我无法进一步解析它。有时JSON有超过5个级别

我想提取_idactType以及位于不同级别的“children”中的所有文本数据。下面是Json文件的示例。非常感谢你的帮助

{
    "_id": "test1",
    "actType": "FINDING",
    "entries": [{
            "text": "U Ergebnis:",
            "isDocumentationNode": false,
            "children": [{
                    "text": "U3: Standartext",
                    "isDocumentationNode": true,
                    "children": []
                }, {
                    "text": "Brückner durchgeführt o.p.B.",
                    "isDocumentationNode": true,
                    "children": []
                }, {
                    "text": "Normale körperliche und altersgerecht Entwicklung",
                    "isDocumentationNode": true,
                    "children": [{
                            "text": "J1/2",
                            "isDocumentationNode": false,
                            "children": [{
                                    "text": "Schule:",
                                    "isDocumentationNode": true,
                                    "children": [{
                                            "text": "Ziel Abitur",
                                            "isDocumentationNode": true,
                                            "children": [{
                                                    "text": "läuft",
                                                    "isDocumentationNode": true,
                                                    "children": []
                                                }, {
                                                    "text": "gefährdet",
                                                    "isDocumentationNode": true,
                                                    "children": []
                                                }, {
                                                    "text": "läuft",
                                                    "isDocumentationNode": true,
                                                    "children": []
                                                }, {
                                                    "text": "gefährdet",
                                                    "isDocumentationNode": true,
                                                    "children": []
                                                }
                                            ]
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]

}
import pandas as pd

# load file
df = pd.read_json('test.json')

# display(df)
     _id  actType                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   entries
0  test1  FINDING  {'text': 'U Ergebnis:', 'isDocumentationNode': False, 'children': [{'text': 'U3: Standartext', 'isDocumentationNode': True, 'children': []}, {'text': 'Brückner durchgeführt o.p.B.', 'isDocumentationNode': True, 'children': []}, {'text': 'Normale körperliche und altersgerecht Entwicklung', 'isDocumentationNode': True, 'children': [{'text': 'J1/2', 'isDocumentationNode': False, 'children': [{'text': 'Schule:', 'isDocumentationNode': True, 'children': [{'text': 'Ziel Abitur', 'isDocumentationNode': True, 'children': [{'text': 'läuft', 'isDocumentationNode': True, 'children': []}, {'text': 'gefährdet', 'isDocumentationNode': True, 'children': []}, {'text': 'läuft', 'isDocumentationNode': True, 'children': []}, {'text': 'gefährdet', 'isDocumentationNode': True, 'children': []}]}]}]}]}]}
  • 这会在'entries'列中产生一个嵌套的dict,但我需要一个平坦、宽的数据帧,所有键都作为列

Tags: 文件数据textidjsontrue级别entries
1条回答
网友
1楼 · 发布于 2024-04-29 11:37:22
import json
import pandas as pd

# list of files
files = ['test1.json', 'test2.json']

# list to add dataframe from each file
df_list = list()

# iterate through files
for file in files:
    with open(file, 'r', encoding='utf-8') as f:

        # read with json
        data = json.loads(f.read())

        # flatten_json into a dataframe and add to the dataframe list
        df_list.append(pd.DataFrame.from_dict(flatten_json(data), orient='index').T)
        
# concat all dataframes together
df = pd.concat(df_list).reset_index(drop=True)

# display(df)
     _id  actType entries_0_text entries_0_isDocumentationNode entries_0_children_0_text entries_0_children_0_isDocumentationNode     entries_0_children_1_text entries_0_children_1_isDocumentationNode                          entries_0_children_2_text entries_0_children_2_isDocumentationNode entries_0_children_2_children_0_text entries_0_children_2_children_0_isDocumentationNode entries_0_children_2_children_0_children_0_text entries_0_children_2_children_0_children_0_isDocumentationNode entries_0_children_2_children_0_children_0_children_0_text entries_0_children_2_children_0_children_0_children_0_isDocumentationNode entries_0_children_2_children_0_children_0_children_0_children_0_text entries_0_children_2_children_0_children_0_children_0_children_0_isDocumentationNode entries_0_children_2_children_0_children_0_children_0_children_1_text entries_0_children_2_children_0_children_0_children_0_children_1_isDocumentationNode entries_0_children_2_children_0_children_0_children_0_children_2_text entries_0_children_2_children_0_children_0_children_0_children_2_isDocumentationNode entries_0_children_2_children_0_children_0_children_0_children_3_text entries_0_children_2_children_0_children_0_children_0_children_3_isDocumentationNode
0  test1  FINDING    U Ergebnis:                         False           U3: Standartext                                     True  Brückner durchgeführt o.p.B.                                     True  Normale körperliche und altersgerecht Entwicklung                                     True                                 J1/2                                               False                                         Schule:                                                           True                                                Ziel Abitur                                                                      True                                                                 läuft                                                                                 True                                                             gefährdet                                                                                 True                                                                 läuft                                                                                 True                                                             gefährdet                                                                                 True
1  test2  FINDING    U Ergebnis:                         False           U3: Standartext                                     True  Brückner durchgeführt o.p.B.                                     True  Normale körperliche und altersgerecht Entwicklung                                     True                                 J1/2                                               False                                         Schule:                                                           True                                                Ziel Abitur                                                                      True                                                                 läuft                                                                                 True                                                             gefährdet                                                                                 True                                                                   NaN                                                                                  NaN                                                                   NaN                                                                                  NaN

资料

  • test1.json
{
    "_id": "test1",
    "actType": "FINDING",
    "entries": [{
            "text": "U Ergebnis:",
            "isDocumentationNode": false,
            "children": [{
                    "text": "U3: Standartext",
                    "isDocumentationNode": true,
                    "children": []
                }, {
                    "text": "Brückner durchgeführt o.p.B.",
                    "isDocumentationNode": true,
                    "children": []
                }, {
                    "text": "Normale körperliche und altersgerecht Entwicklung",
                    "isDocumentationNode": true,
                    "children": [{
                            "text": "J1/2",
                            "isDocumentationNode": false,
                            "children": [{
                                    "text": "Schule:",
                                    "isDocumentationNode": true,
                                    "children": [{
                                            "text": "Ziel Abitur",
                                            "isDocumentationNode": true,
                                            "children": [{
                                                    "text": "läuft",
                                                    "isDocumentationNode": true,
                                                    "children": []
                                                }, {
                                                    "text": "gefährdet",
                                                    "isDocumentationNode": true,
                                                    "children": []
                                                }, {
                                                    "text": "läuft",
                                                    "isDocumentationNode": true,
                                                    "children": []
                                                }, {
                                                    "text": "gefährdet",
                                                    "isDocumentationNode": true,
                                                    "children": []
                                                }
                                            ]
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]

}

  • test2.json
{
    "_id": "test2",
    "actType": "FINDING",
    "entries": [{
            "text": "U Ergebnis:",
            "isDocumentationNode": false,
            "children": [{
                    "text": "U3: Standartext",
                    "isDocumentationNode": true,
                    "children": []
                }, {
                    "text": "Brückner durchgeführt o.p.B.",
                    "isDocumentationNode": true,
                    "children": []
                }, {
                    "text": "Normale körperliche und altersgerecht Entwicklung",
                    "isDocumentationNode": true,
                    "children": [{
                            "text": "J1/2",
                            "isDocumentationNode": false,
                            "children": [{
                                    "text": "Schule:",
                                    "isDocumentationNode": true,
                                    "children": [{
                                            "text": "Ziel Abitur",
                                            "isDocumentationNode": true,
                                            "children": [{
                                                    "text": "läuft",
                                                    "isDocumentationNode": true,
                                                    "children": []
                                                }, {
                                                    "text": "gefährdet",
                                                    "isDocumentationNode": true,
                                                    "children": []
                                                }
                                            ]
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]

}

相关问题 更多 >