在python中从列表中删除重复的JSON对象

2024-04-19 18:15:00 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个dict列表,其中一个特定的值重复多次,我想删除重复的值。

我的列表:

te = [
      {
        "Name": "Bala",
        "phone": "None"
      },
      {
        "Name": "Bala",
        "phone": "None"
      },
      {
        "Name": "Bala",
        "phone": "None"
      },
      {
        "Name": "Bala",
        "phone": "None"
      }
    ]

删除重复值的函数:

def removeduplicate(it):
    seen = set()
    for x in it:
        if x not in seen:
            yield x
            seen.add(x)

当我调用这个函数时,我得到generator object

<generator object removeduplicate at 0x0170B6E8>

当我尝试在生成器上迭代时,得到TypeError: unhashable type: 'dict'

是否有方法删除重复值或在生成器上迭代


Tags: 函数nameinnone列表objectdefphone
3条回答

您仍然可以使用set进行重复检测,只需将字典转换为可散列的内容,例如tuple。您的字典可以通过tuple(d.items())转换为元组,其中d是字典。将其应用于生成器函数:

def removeduplicate(it):
    seen = set()
    for x in it:
        t = tuple(x.items())
        if t not in seen:
            yield x
            seen.add(t)

>>> for d in removeduplicate(te):
...    print(d)
{'phone': 'None', 'Name': 'Bala'}

>>> te.append({'Name': 'Bala', 'phone': '1234567890'})
>>> te.append({'Name': 'Someone', 'phone': '1234567890'})

>>> for d in removeduplicate(te):
...    print(d)
{'phone': 'None', 'Name': 'Bala'}
{'phone': '1234567890', 'Name': 'Bala'}
{'phone': '1234567890', 'Name': 'Someone'}

这提供了比“seen”list(O(n))更快的查找(avg.O(1))。把每个dict转换成一个元组是否值得额外的计算,取决于你有多少字典和有多少重复的字典。如果有很多重复,一个“seen”list将变得相当大,测试一个dict是否已经被看到可能成为一个昂贵的操作。这可能会证明元组转换是正确的-您必须测试/分析它。

因为不能将dict添加到set。来自this question

You're trying to use a dict as a key to another dict or in a set. That does not work because the keys have to be hashable.

As a general rule, only immutable objects (strings, integers, floats, frozensets, tuples of immutables) are hashable (though exceptions are possible).

>>> foo = dict()
>>> bar = set()
>>> bar.add(foo)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
TypeError: unhashable type: 'dict'
>>> 

相反,您已经在使用if x not in seen,所以只需使用一个列表:

>>> te = [
...       {
...         "Name": "Bala",
...         "phone": "None"
...       },
...       {
...         "Name": "Bala",
...         "phone": "None"
...       },
...       {
...         "Name": "Bala",
...         "phone": "None"
...       },
...       {
...         "Name": "Bala",
...         "phone": "None"
...       }
...     ]

>>> def removeduplicate(it):
...     seen = []
...     for x in it:
...         if x not in seen:
...             yield x
...             seen.append(x)

>>> removeduplicate(te)
<generator object removeduplicate at 0x7f3578c71ca8>

>>> list(removeduplicate(te))
[{'phone': 'None', 'Name': 'Bala'}]
>>> 

您可以通过字典理解轻松删除重复键,因为字典不允许重复键,如下所示-

te = [
      {
        "Name": "Bala",
        "phone": "None"
      },
      {
        "Name": "Bala",
        "phone": "None"
      },
      {
        "Name": "Bala",
        "phone": "None"
      },
      {
        "Name": "Bala",
        "phone": "None"
      },
      {
          "Name": "Bala1",
          "phone": "None"
      }      
    ]

unique = { each['Name'] : each for each in te }.values()

print unique

输出-

[{'phone': 'None', 'Name': 'Bala1'}, {'phone': 'None', 'Name': 'Bala'}]

相关问题 更多 >