在Python中将嵌套JSON转换为CSV文件

{ "company_number": "12345678", "data": { "address": { "address_line_1": "Address 1", "locality": "Henley-On-Thames", "postal_code": "RG9 1DP", "premises": "161", "region": "Oxfordshire" }, "country_of_residence": "England", "date_of_birth": { "month": 2, "year": 1977 }, "etag": "26281dhge33b22df2359sd6afsff2cb8cf62bb4a7f00", "kind": "individual-person-with-significant-control", "links": { "self": "/company/12345678/persons-with-significant-control/individual/bIhuKnFctSnjrDjUG8n3NgOrl" }, "name": "John M Smith", "name_elements": { "forename": "John", "middle_name": "M", "surname": "Smith", "title": "Mrs" }, "nationality": "Vietnamese", "natures_of_control": [ "ownership-of-shares-50-to-75-percent" ], "notified_on": "2016-04-06" } }

1条回答

网友

1楼 · 发布于 2024-05-16 01:13:48

对于给定的JSON数据，可以通过解析JSON结构来执行此操作，只需返回所有叶节点的列表。

这假设您的结构始终是一致的，如果每个条目可以有不同的字段，请参阅第二种方法。

例如：

import json
import csv

def get_leaves(item, key=None):
    if isinstance(item, dict):
        leaves = []
        for i in item.keys():
            leaves.extend(get_leaves(item[i], i))
        return leaves
    elif isinstance(item, list):
        leaves = []
        for i in item:
            leaves.extend(get_leaves(i, key))
        return leaves
    else:
        return [(key, item)]


with open('json.txt') as f_input, open('output.csv', 'w', newline='') as f_output:
    csv_output = csv.writer(f_output)
    write_header = True

    for entry in json.load(f_input):
        leaf_entries = sorted(get_leaves(entry))

        if write_header:
            csv_output.writerow([k for k, v in leaf_entries])
            write_header = False

        csv_output.writerow([v for k, v in leaf_entries])

如果JSON数据是以给定格式列出的条目的列表，则应获得如下输出：

address_line_1,company_number,country_of_residence,etag,forename,kind,locality,middle_name,month,name,nationality,natures_of_control,notified_on,postal_code,premises,region,self,surname,title,year
Address 1,12345678,England,26281dhge33b22df2359sd6afsff2cb8cf62bb4a7f00,John,individual-person-with-significant-control,Henley-On-Thames,M,2,John M Smith,Vietnamese,ownership-of-shares-50-to-75-percent,2016-04-06,RG9 1DP,161,Oxfordshire,/company/12345678/persons-with-significant-control/individual/bIhuKnFctSnjrDjUG8n3NgOrl,Smith,Mrs,1977
Address 1,12345679,England,26281dhge33b22df2359sd6afsff2cb8cf62bb4a7f00,John,individual-person-with-significant-control,Henley-On-Thames,M,2,John M Smith,Vietnamese,ownership-of-shares-50-to-75-percent,2016-04-06,RG9 1DP,161,Oxfordshire,/company/12345678/persons-with-significant-control/individual/bIhuKnFctSnjrDjUG8n3NgOrl,Smith,Mrs,1977

如果每个条目可以包含不同的（或可能丢失的）字段，那么更好的方法是使用DictWriter。在这种情况下，需要处理所有条目以确定可能的fieldnames的完整列表，以便可以写入正确的头。

import json
import csv

def get_leaves(item, key=None):
    if isinstance(item, dict):
        leaves = {}
        for i in item.keys():
            leaves.update(get_leaves(item[i], i))
        return leaves
    elif isinstance(item, list):
        leaves = {}
        for i in item:
            leaves.update(get_leaves(i, key))
        return leaves
    else:
        return {key : item}


with open('json.txt') as f_input:
    json_data = json.load(f_input)

# First parse all entries to get the complete fieldname list
fieldnames = set()

for entry in json_data:
    fieldnames.update(get_leaves(entry).keys())

with open('output.csv', 'w', newline='') as f_output:
    csv_output = csv.DictWriter(f_output, fieldnames=sorted(fieldnames))
    csv_output.writeheader()
    csv_output.writerows(get_leaves(entry) for entry in json_data)

相关问题更多 >

编程相关推荐

热门问题

热门文章