Python去重/合并字典列表

0 投票

1 回答

1393 浏览

提问于 2025-04-17 21:59

假设我有一个字典的列表：

list = [{'name':'john','age':'28','location':'hawaii','gender':'male'},
        {'name':'john','age':'32','location':'colorado','gender':'male'},
        {'name':'john','age':'32','location':'colorado','gender':'male'},
        {'name':'parker','age':'24','location':'new york','gender':'male'}]

在这个字典中，'name'可以看作是一个唯一的标识符。我的目标不仅是去掉这个列表中完全相同的字典（比如 list[1] 和 list[2]），还想把同一个 'name' 的不同值合并到一起（比如 list[0] 和 list[1/2]）。换句话说，我想把所有 'name'='john' 的字典合并成一个字典，像这样：

dedup_list = [{'name':'john','age':'28; 32','location':'hawaii; colorado','gender':'male'},
              {'name':'parker','age':'24','location':'new york','gender':'male'} ]

到目前为止，我尝试创建我的第二个列表 dedup_list，并遍历第一个列表。如果 'name' 这个键在 dedup_list 的字典中还不存在，我就把它加进去。但在合并的部分我遇到了困难。

for dict in list:
    for new_dict in dedup_list:
        if dict['name'] in new_dict:
            # MERGE OTHER DICT FIELDS HERE
        else:
            dedup_list.append(dict) # This will create duplicate values as it iterates through each row of the dedup_list.  I can throw them in a set later to remove?

我的字典列表最多不会超过100个项目，所以 O(n^2) 的解决方案是可以接受的，但不一定是最理想的。这个 dedup_list 最终会写入一个 CSV 文件，所以如果有涉及到这个的解决方案，我非常乐意听取。

谢谢！

数据结构数据处理字典合并时间复杂度去重算法列表遍历唯一标识符 csv 文件

1 个回答

好吧，我本来打算围绕 defaultdict 来找个解决办法，但希望 @hivert 提出的方案是我能想到的最好解决办法，具体可以看这个回答：

from collections import defaultdict

dicts = [{'a':1, 'b':2, 'c':3},
         {'a':1, 'd':2, 'c':'foo'},
         {'e':57, 'c':3} ]

super_dict = defaultdict(set)  # uses set to avoid duplicates

for d in dicts:
    for k, v in d.iteritems():
        super_dict[k].add(v)

也就是说，我建议把这个问题关闭，因为它和那个问题是重复的。

注意：你不会得到像 '28; 32' 这样的值，而是会得到一个包含 [28,32] 的集合，这样你就可以根据需要把它处理成 csv 文件。

注意2：如果要写 csv 文件，可以看看 DictWriter 类。

回答于 2025-04-17 由 Python大师

分享举报

Python去重/合并字典列表

1 个回答

撰写回答