如何创建自定义嵌套dict?

2024-05-20 22:45:46 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试从pythonwithout collection模块中读取的文件创建一个自定义nested dict。我的数据结构是bellow。你知道吗

d = {'employee': 
     {'developer1': 
      {'id1':
       {'language': ('c', 'java'),
        'worked_area':('delhi', 'kolkata')
       },
       'id2':
        {'language':('python' , 'c++'),
         'worked_area':('kolkata')
        }
       },
      'devloper2': 
      {'id1':
       {'language': ('c', 'java'),
        'worked_area':('delhi', 'kolkata')
       }
      }
     }
    }

使用以下代码读取数据结构:

for k1, v1 in d.items():
    for k2, v2 in v1.items():
        for k3, v3 in v2.items():
            for k4, v5 in v3.items():
                print(k1, k2, k3, k4, v5)

文件: text1.txt

employee    developer1  id1 language    c
employee    developer1  id1 language    java
employee    developer1  id1 worked_area delhi
employee    developer1  id1 worked_area kolkata
employee    developer1  id2 language    python
employee    developer1  id2 language    c++
employee    developer1  id2 worked_area kolkata
employee    devloper2   id1 language    c
employee    devloper2   id1 language    java
employee    devloper2   id1 worked_area delhi
employee    devloper2   id1 worked_area kolkata

现在,我尝试从上面的文本文件创建上面的dictionary数据结构,并使用上面的代码打印它的内容。你知道吗

import re
d = {}
fh = open('text1.txt', 'r')
for i, line in enumerate(fh):
    line = line.strip()
    tmp = re.split(r'\t+', line)
    d[tmp[0]][tmp[1]][tmp[2]][tmp[3]].append(tmp[4])

但是我在运行代码时遇到了下面的错误

错误

KeyError: 'employee'

所以需要帮助创建数据结构代码。你知道吗


Tags: 代码inforemployeeareajavalanguagetmp
2条回答

应要求:

只需使用内置的dict,您就可以:

import re
d = {}
fh = open('text1.txt', 'r')
for i, line in enumerate(fh):
    line = line.strip()
    tmp = re.split(r'\t+', line)
    if tmp[0] not in d:
        d[tmp[0]] = {}
    if tmp[1] not in d[tmp[0]]:
        d[tmp[0]][tmp[1]] = {}
    if tmp[2] not in d[tmp[0]][tmp[1]]:
        d[tmp[0]][tmp[1]][tmp[2]] = {}
    if tmp[3] not in d[tmp[0]][tmp[1]][tmp[2]]:
        d[tmp[0]][tmp[1]][tmp[2]][tmp[3]] = []
    d[tmp[0]][tmp[1]][tmp[2]][tmp[3]].append(tmp[4])

再多想一想,也许就能找到一个更优雅的解决方案。人们以前一定想过这个。例如,处理JSON文件的人。你知道吗

你的问题是你初始化了一个空的dict,没有employee键,所以你得到了KeyError

>>> d = {}
>>> d['employee']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'employee'

下一个问题是employee键对应的值本身应该是dict,依此类推。要解决这个问题,可以使用嵌套的^{}s

由于嵌套深度是恒定的和已知的,所以只需要初始化一棵树。它是列表的defaultdict的defaultdict的defaultdict:)

一旦这个树被初始化,就很容易将信息附加到叶子上。请注意,应该使用列表而不是元组:languages的长度直到最后才知道,并且不能向元组追加值。你知道吗

data = """employee    developer1  id1 language    c
employee    developer1  id1 language    java
employee    developer1  id1 worked_area delhi
employee    developer1  id1 worked_area kolkata
employee    developer1  id2 language    python
employee    developer1  id2 language    c++
employee    developer1  id2 worked_area kolkata
employee    devloper2   id1 language    c
employee    devloper2   id1 language    java
employee    devloper2   id1 worked_area delhi
employee    devloper2   id1 worked_area kolkata"""

from collections import defaultdict

tree = defaultdict(lambda: defaultdict(lambda: defaultdict(lambda: defaultdict(list))))

for line in data.splitlines():
    k1, k2, k3, k4, v = line.split()
    tree[k1][k2][k3][k4].append(v)

print(tree)
# defaultdict(<function <lambda> at 0x7f2e771cd7d0>, {'employee': defaultdict(<function <lambda> at 0x7f2e771cdf50>, {'developer1': defaultdict(<function <lambda> at 0x7f2e771cf050>, {'id2': defaultdict(<type 'list'>, {'worked_area': ['kolkata'], 'language': ['python', 'c++']}), 'id1': defaultdict(<type 'list'>, {'worked_area': ['delhi', 'kolkata'], 'language': ['c', 'java']})}), 'devloper2': defaultdict(<function <lambda> at 0x7f2e771cf0c8>, {'id1': defaultdict(<type 'list'>, {'worked_area': ['delhi', 'kolkata'], 'language': ['c', 'java']})})})})

print(tree['employee']['developer1']['id2']['language'])
# ['python', 'c++']

print(tree['employee']['developerX']['idX']['language'])
# []

要查看树的结构,可以使用json.dumps

import json
print(json.dumps(tree, indent=4))

它输出:

{
    "employee": {
        "developer1": {
            "id1": {
                "language": [
                    "c",
                    "java"
                ],
                "worked_area": [
                    "delhi",
                    "kolkata"
                ]
            },
            "id2": {
                "language": [
                    "python",
                    "c++"
                ],
                "worked_area": [
                    "kolkata"
                ]
            }
        },
        "devloper2": {
            "id1": {
                "language": [
                    "c",
                    "java"
                ],
                "worked_area": [
                    "delhi",
                    "kolkata"
                ]
            }
        }
    }
}

因为defaultdict也是dict,所以您可以像建议的那样迭代这些值。你知道吗

相关问题 更多 >