创建唯一目录列表的优化方法

2024-06-16 10:23:50 发布

您现在位置:Python中文网/ 问答频道 /正文

所以我需要一个在python中创建dict列表的最佳方法。你知道吗

所以我有一个列表如下:

[
   {'name': 'John', 'hobbies': ['Reading', 'Swimming']},
   {'name': 'Gina', 'hobbies': ['Skating', 'Cooking']},
   {'name': 'John', 'hobbies': ['Gardening', 'Swimming']}
]

所以我需要输出如下:

[
   {'name': 'John', 'hobbies': ['Reading', 'Swimming', 'Gardening']},
   {'name': 'Gina', 'hobbies': ['Skating', 'Cooking']},
]

如您所见,我需要为每个名字创建一组爱好,并创建一个独特的dict列表。你知道吗

这就是我尝试过的:

{v['_id']['route']: v for v in routes_list}.values()

但它不负责创建

有谁能帮我以最理想的方式这样做吗?你知道吗

谢谢你。你知道吗


Tags: 方法nameid列表名字johnroutedict
2条回答

只需构造一个中间默认字典,它使您能够在线性时间内完成此操作。最后转换回所需的结构。你知道吗

inp = [
   {'name': 'John', 'hobbies': ['Reading', 'Swimming']},
   {'name': 'Gina', 'hobbies': ['Skating', 'Cooking']},
   {'name': 'John', 'hobbies': ['Gardening', 'Swimming']}
]

from collections import defaultdict
temp = defaultdict(set)
for d in inp:
    temp[d['name']].update(d['hobbies'])

result = [{'name':k, 'hobbies': list(v)} for k, v in temp.items()]

输出:

[{'name': 'John', 'hobbies': ['Gardening', 'Reading', 'Swimming']},
 {'name': 'Gina', 'hobbies': ['Cooking', 'Skating']}]

如果您同意将输出的结构从名称更改为爱好集,则可以在线性时间内完成(忽略边缘情况,即大量哈希冲突):

from collections import defaultdict

data = [
    {'name': 'John', 'hobbies': ['Reading', 'Swimming']},
    {'name': 'Gina', 'hobbies': ['Skating', 'Cooking']},
    {'name': 'John', 'hobbies': ['Gardening', 'Swimming']}
]

output = defaultdict(set)

for d in data:
    output[d['name']].update(d['hobbies'])

print(output)
# defaultdict(<class 'set'>, {'John': {'Reading', 'Swimming', 'Gardening'},
#                             'Gina': {'Cooking', 'Skating'}})

如果您坚持使用dict列表,您仍然可以实现几乎线性时间(列表查找仍然是O(n)),但是使用一个逻辑将索引映射到名称:

data = [
        {'name': 'John', 'hobbies': ['Reading', 'Swimming']},
        {'name': 'Gina', 'hobbies': ['Skating', 'Cooking']},
        {'name': 'John', 'hobbies': ['Gardening', 'Swimming']}
    ]

output = []
names_to_indices = {}
for d in data:
    if d['name'] not in names_to_indices:
        output.append({'name': d['name'], 'hobbies': d['hobbies']})
        names_to_indices[d['name']] = len(output) - 1
    else:
        index = names_to_indices[d['name']]
        for hobbie in d['hobbies']:
            if hobbie not in output[index]['hobbies']:
                output[index]['hobbies'].append(hobbie)
print(output)
# [{'name': 'John', 'hobbies': ['Reading', 'Swimming', 'Gardening']},
#  {'name': 'Gina', 'hobbies': ['Skating', 'Cooking']}]

如果您同意业余爱好是一个集合,那么您可以将其设为真正的线性时间(同样,如果我们忽略了过度哈希冲突的可能性):

data = [
        {'name': 'John', 'hobbies': ['Reading', 'Swimming']},
        {'name': 'Gina', 'hobbies': ['Skating', 'Cooking']},
        {'name': 'John', 'hobbies': ['Gardening', 'Swimming']}
    ]

output = []
names_to_indices = {}
for d in data:
    if d['name'] not in names_to_indices:
        output.append({'name': d['name'], 'hobbies': set(d['hobbies'])})
        names_to_indices[d['name']] = len(output) - 1
    else:
        index = names_to_indices[d['name']]
        output[index]['hobbies'].update(d['hobbies'])
print(output)
# [{'name': 'John', 'hobbies': {'Gardening', 'Swimming', 'Reading'}},
#  {'name': 'Gina', 'hobbies': {'Skating', 'Cooking'}}]

相关问题 更多 >