在检查内容时展平嵌套字典

0 投票

3 回答

1517 浏览

提问于 2025-04-18 13:12

我有一个字典，长这样：

source = {

    'Section 1' : {
        'range'       : [0, 200],
        'template'    : 'ID-LOA-XXX',
        'nomenclature': True
    },

    'Section 2' : {
        'range'       : [201, 800],
        'template'    : 'ID-EPI-XXX',
        'nomenclature': False,
        'Subsection 1' : {
            'range'       : [0, 400],
            'template'    : 'ID-EPI-S1-XXX',
            'nomenclature': False,
            'Subsubsection 1' : {
                'range'       : [0, 400],
                'template'    : 'ID-EPI-S12-XXX',
                'nomenclature': False
            }
        },
        'Subsection 2' : {
            'range'       : [0, 400],
            'template'    : 'ID-EPI-S2-XXX',
            'nomenclature': False
        }
    }, 

    # etc.

}

这个字典是从一个JSON文件加载的。我想把它“扁平化”，变成下面这个字典：

target = {

    'Section 1' : {
        'range'       : [0, 200],
        'template'    : 'ID-LOA-XXX',
        'nomenclature': True,
        'location'    : './Section 1/'
    },

    'Section 2' : {
        'range'       : [201, 800],
        'template'    : 'ID-EPI-XXX',
        'nomenclature': False,
        'location'    : './Section 2/'
    },

    'Subsection 1' : {
        'range'       : [0, 400],
        'template'    : 'ID-EPI-S1-XXX',
        'nomenclature': False,
        'location'    : './Section 2/Subsection 1/'
    },

    'Subsubsection 1' : {
        'range'       : [0, 400],
        'template'    : 'ID-EPI-S12-XXX',
        'nomenclature': False,
        'location'    : './Section 2/Subsection 1/Subsubsection 1'
    },

    'Subsection 2' : {
        'range'       : [0, 400],
        'template'    : 'ID-EPI-S2-XXX',
        'nomenclature': False,
        'location'    : './Section 2/Subsection 2/'
    },

    # etc.

}

我可能可以改变原始JSON文件的生成方式，但我更希望不去动它。

简单来说，这个JSON文件的内容是：每个部分至少有三个键，可能还有其他键。这些其他键被理解为当前部分的子部分，每个子部分都是一个dict，并且有相同的属性。这个结构理论上可以无限嵌套。

我还想进行一些检查：

检查所有必需的字段是否都存在（'range'、'template'和'nomenclature'）
确保这些必需字段的值符合某些要求

到目前为止，我只完成了这些检查：

import json

key_requirements = {
    "nomenclature": lambda x : isinstance(x, bool),
    "template"    : lambda x : isinstance(x, str)  and "X" in x,
    "range"       : lambda x : isinstance(x, list) and len(x)==2 and all([isinstance(y,int) for y in x]) and x[1] > x[0]
}

def checkSection(section):

    for key in section:
        if key not in key_requirements:            
            checkSection(section[key])

        elif not key_requirements[key]( section[key] ): 
            # error: assertion failed
            pass

        else:      
            # error: key not present
            pass

for key in source # json.load(open(myJsonFile))
    checkSection(data[key])

但现在，无论喝多少咖啡，我都想不出一个高效、优雅、符合Python风格的方法来实现我想要的转换...

有没有什么建议或想法？

数据验证数据结构设计数据转换 json处理嵌套结构字典检查字典扁平化必需字段

3 个回答

我最后得到了这个解决方案：

import os

key_requirements = {
    "nomenclature": lambda x : isinstance(x, bool),
    "template"    : lambda x : isinstance(x, str)  and "X" in x,
    "range"       : lambda x : isinstance(x, list) and len(x)==2 and all([isinstance(y,int) for y in x]) and x[1] > x[0]
}


def checkAndFlattenData(data):

    def merge_dicts(dict1,dict2):
        return dict(list(dict1.items()) + list(dict2.items()))


    def check_section(section, section_content):

        section_out = {
            'range'   : section_content['range'],
            'template': section_content['template'],
            'location': section
        }
        nested_section_out = {}

        for key,value in section_content.iteritems():

            if key not in key_requirements:
                if not isinstance(value,dict):
                    # error: invalid key
                    pass

                else:
                    nested_section_out[key], recurse_out = check_section(key,value)
                    nested_section_out = merge_dicts(nested_section_out, recurse_out)


            elif not key_requirements[key](value):
                print "ASSERTION FAILED!"# error: field assertion failed
                pass

        for key in nested_section_out:
            nested_section_out[key]['location'] = os.path.join(section, nested_section_out[key]['location'])

        return section_out, nested_section_out

    new_data = {}
    for key,value in data.iteritems():
        new_data[key], nested_data = check_section(key, value)
        new_data = merge_dicts(new_data, nested_data)

    for key,value in new_data.iteritems():
        new_data[key]['location'] = os.path.join('.', new_data[key]['location'])

    return new_data


target = checkAndFlattenData(source)

但是我总觉得这个方法可以用更“Python风格”的方式来做得更好（或者更高效）……如果有人有建议，请随意复制这个内容，并在独立的回答中提出改进，这样我就可以接受你的建议了。

回答于 2025-04-18 由 Python大师

分享举报

这个代码适合你的情况：

output = {}
for key, value in source.iteritems():
    item = {}
    for nested_key, nested_value in value.iteritems():
        if type(nested_value) == type({}):
            nested_item = {}
            for nested_key_2, nested_value_2 in nested_value.iteritems():
                nested_item[nested_key_2] = nested_value_2
            output[nested_key] = nested_item
        else:
            item[nested_key] = nested_value
    output[key] = item

回答于 2025-04-18 由 Python大师

分享举报

这个问题需要用递归的方式来遍历。如果你不想使用一些第三方库（其实是有这样的解决方案的），那么你就得自己写一个简单的递归遍历方法。

注意：路径的处理方式可能和你想的不一样，因为我是在Windows系统上。

实现方法

def flatten(source):
    target = {}
    def helper(src, path ='.', last_key = None):
        if last_key: 
            target[last_key] = {}
            target[last_key]['location'] = path
        for key, value in src.items():
            if isinstance(value, dict):
                helper(value, os.path.join(path, key), key)

            else:
                target[last_key][key] = value

    helper(source)
    return target

输出结果

>>> pprint.pprint(source)
{'Section 1': {'nomenclature': True,
               'range': [0, 200],
               'template': 'ID-LOA-XXX'},
 'Section 2': {'Subsection 1': {'Subsubsection 1': {'nomenclature': False,
                                                    'range': [0, 400],
                                                    'template': 'ID-EPI-S12-XXX'},
                                'nomenclature': False,
                                'range': [0, 400],
                                'template': 'ID-EPI-S1-XXX'},
               'Subsection 2': {'nomenclature': False,
                                'range': [0, 400],
                                'template': 'ID-EPI-S2-XXX'},
               'nomenclature': False,
               'range': [201, 800],
               'template': 'ID-EPI-XXX'}}
>>> pprint.pprint(flatten(source))
{'Section 1': {'location': '\\Section 1',
               'nomenclature': True,
               'range': [0, 200],
               'template': 'ID-LOA-XXX'},
 'Section 2': {'location': '\\Section 2',
               'nomenclature': False,
               'range': [201, 800],
               'template': 'ID-EPI-XXX'},
 'Subsection 1': {'location': '\\Section 2\\Subsection 1',
                  'nomenclature': False,
                  'range': [0, 400],
                  'template': 'ID-EPI-S1-XXX'},
 'Subsection 2': {'location': '\\Section 2\\Subsection 2',
                  'nomenclature': False,
                  'range': [0, 400],
                  'template': 'ID-EPI-S2-XXX'},
 'Subsubsection 1': {'location': '\\Section 2\\Subsection 1\\Subsubsection 1',
                     'nomenclature': False,
                     'range': [0, 400],
                     'template': 'ID-EPI-S12-XXX'}}

回答于 2025-04-18 由 Python大师

分享举报

在检查内容时展平嵌套字典

3 个回答

撰写回答