不使用嵌套循环在dict列表中查找相同的键/值对

2024-04-29 10:57:45 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在做一个非常简单的计算,在一个dict列表中找到相同的键/值对,通过求和将它们组合起来。假设数据是:

编辑:姓名和姓名;id是任意名称例如,我有一个非常大的dict,其中我使用多个键

输入

{
  "name":"first",
  "id":"1234",
  "quantity":10
},
{
  "name":"first",
  "id":"1234",
  "quantity":30
},
{
  "name":"another",
  "id":"0000",
  "quantity":10
}

输出

{
 "name":"first",
  "id":"1234",
  "quantity":40
},
{
  "name":"another",
  "id":"0000",
  "quantity":10
}

我很想了解如何以“pythonic”的方式来实现这一点,尽可能避免嵌套循环

现在我有一个我不满意的东西:

for entry in quantities:
    for compare in quantities:
        if id(entry) != id(compare):
            if (entry["name"] == compare["name"]) and (entry["id"] == compare["id"]):
                entry["quantity"] = entry["quantity"] + compare["quantity"]
                quantities.remove(compare)

任何提示/建议都将不胜感激,谢谢


Tags: 数据nameinid列表forifanother
2条回答

在你的键上使用另一个字典和组,我的意思是“name”和“id”(尽管,"id"是不够的吗?如果不够的话会误导人)

比如:

grouper = {}
for q in quantities:
    key = q['name'], q['id']
    if key in grouper:
        grouper[key]['quantity'] += q['quantity']
    else:
        grouper[key] = q.copy()
quantities = list(grouper.values())

在答复中:

In [1]: quantities = [
   ...: {
   ...:   "name":"first",
   ...:   "id":"1234",
   ...:   "quantity":10
   ...: },
   ...: {
   ...:   "name":"first",
   ...:   "id":"1234",
   ...:   "quantity":30
   ...: },
   ...: {
   ...:   "name":"another",
   ...:   "id":"0000",
   ...:   "quantity":10
   ...: }
   ...: ]

In [2]: grouper = {}

In [3]: for q in quantities:
   ...:     key = q['name'], q['id']
   ...:     if key in grouper:
   ...:         grouper[key]['quantity'] += q['quantity']
   ...:     else:
   ...:         grouper[key] = q.copy()
   ...:

In [4]: grouper
Out[4]:
{('first', '1234'): {'name': 'first', 'id': '1234', 'quantity': 40},
 ('another', '0000'): {'name': 'another', 'id': '0000', 'quantity': 10}}

然后,您可以直接从值中获取新列表:

In [5]: list(grouper.values())
Out[5]:
[{'name': 'first', 'id': '1234', 'quantity': 40},
 {'name': 'another', 'id': '0000', 'quantity': 10}]

这种方法需要线性时间和线性空间

注意,q.copy()创建了一个副本,这在这里很好,但如果您的dict中有可变值,则可能不是

另外请注意,您可能需要重新考虑您的数据结构。你真的想要一份清单吗?如果您有一个唯一的键,并且希望能够通过该键快速找到对象,则可能需要某种类型的dict

方法1:使用groupby和reduce

from itertools import groupby
from functools import reduce

def merge(d1, d2):
    ' merge two dictionaries based upon summing key values not in grouper '
    return {k:v if k in grouper else v + d2.get(k, 0) for k, v in d1.items()}

grouper = ("name", "id")  # keys to groupby
lst.sort(key = lambda d:[d[key] for key in grouper])  # Sort list inplace based upon grouper keys
                                                      # Done inplace to save space
# Merge dicts in list in same group based upon merge function
outputlist = [(reduce(merge, g)) for _, g in groupby(lst, lambda d:[d[key] for key in grouper])]
    

[{'name': 'another', 'id': '0000', 'quantity': 10},
 {'name': 'first', 'id': '1234', 'quantity': 40}]

使用熊猫的方法2

避免所有循环的一行程序(方法基本上复制了方法1)

outputlist = pd.DataFrame(lst).groupby(['name', 'id']).sum().reset_index().to_dict('records')

输出列表:

[{'name': 'another', 'id': '0000', 'quantity': 10},
     {'name': 'first', 'id': '1234', 'quantity': 40}]

解释

pd.DataFrame(lst)            - generate pandas DataFrame from list of dictionaries
groupby(['name', 'id'])      - group rows by name & id
sum()                        - sum the non-grouped values in each group
reset_index()                - reset index back to 0, 1, 2, ...
to_dict('records')           - convert to list of dictionaries 
                               with each row data as dictionary
                               

相关问题 更多 >