我有没有用这个重复数据消除功能来重新设计轮子？

with_duplicates = [ { "type": "users", "attributes": { "first-name": "John", "email": "john.smith@gmail.com", "last-name": "Smith", "handle": "jsmith" }, "id": "1234" }, { "type": "users", "attributes": { "first-name": "John", "email": "john.smith@gmail.com", "last-name": "Smith", "handle": "jsmith" }, "id": "1234" } ] without_duplicates = deduplicate_list(with_duplicates, key='id')

3条回答

网友

1楼 · 编辑于 2024-05-16 01:40:32

对于key的每个不同值，您只选择列表中的第一个dict。^{}是一个内置工具，它可以为您完成这项工作—按key排序和分组，并且只从每个组中提取第一个：

from itertools import groupby

def deduplicate(lst, key):
    fnc = lambda d: d.get(key)  # more robust than d[key]
    return [next(g) for k, g in groupby(sorted(lst, key=fnc), key=fnc)]

网友

2楼 · 编辑于 2024-05-16 01:40:32

这个answer将有助于解决一个更一般的问题-找到唯一的元素不是通过单个属性（在您的例子中是id），而是如果任何一个嵌套属性不同

下面的代码将返回唯一元素的索引列表

import copy

def make_hash(o):

  """
  Makes a hash from a dictionary, list, tuple or set to any level, that contains
  only other hashable types (including any lists, tuples, sets, and
  dictionaries).
  """

  if isinstance(o, (set, tuple, list)):

    return tuple([make_hash(e) for e in o])    

  elif not isinstance(o, dict):

    return hash(o)

  new_o = copy.deepcopy(o)
  for k, v in new_o.items():
    new_o[k] = make_hash(v)

  return hash(tuple(frozenset(sorted(new_o.items()))))

l = [
    {
        "type": "users",
        "attributes": {
            "first-name": "John",
            "email": "john.smith@gmail.com",
            "last-name": "Smith",
            "handle": "jsmith"
        },
        "id": "1234"
    },
    {
        "type": "users",
        "attributes": {
            "first-name": "AAA",
            "email": "aaa.aaah@gmail.com",
            "last-name": "XXX",
            "handle": "jsmith"
        },
        "id": "1234"
    },
    {
        "type": "users",
        "attributes": {
            "first-name": "John",
            "email": "john.smith@gmail.com",
            "last-name": "Smith",
            "handle": "jsmith"
        },
        "id": "1234"
    },
]

# get indicies of unique elements
In [254]: list({make_hash(x):i for i,x in enumerate(l)}.values())
Out[254]: [1, 2]

网友

3楼 · 编辑于 2024-05-16 01:40:32

您可以尝试一个简短的版本，该版本基于您在问题中提供的答案链接：

key = "id"
deduplicated = [val for ind, val in enumerate(l)
                if val[key] not in [tmp[key] for tmp in l[ind + 1:]]]
print(deduplicated)

注意，这将使用复制的最后一个元素

相关问题更多 >

编程相关推荐

热门问题

热门文章