如何按值过滤字典？

5 投票

5 回答

12378 浏览

提问于 2025-04-15 13:26

这是个新手问题，请大家耐心点。

假设我有一个字典，长得像这样：

a = {"2323232838": ("first/dir", "hello.txt"),
     "2323221383": ("second/dir", "foo.txt"),
     "3434221": ("first/dir", "hello.txt"),
     "32232334": ("first/dir", "hello.txt"),
     "324234324": ("third/dir", "dog.txt")}

我想把所有相等的值放到另一个字典里。

matched = {"2323232838": ("first/dir", "hello.txt"),
           "3434221":    ("first/dir", "hello.txt"),
           "32232334":   ("first/dir", "hello.txt")}

而剩下那些不匹配的项目应该看起来像这样：

remainder = {"2323221383": ("second/dir", "foo.txt"),
             "324234324":  ("third/dir", "dog.txt")}

提前谢谢大家，如果你能提供个例子，请尽量多加注释。

数据结构编程技巧值比较字典过滤

5 个回答

在Python中，遍历字典和遍历列表没有什么区别：

for key in dic:
    print("dic[%s] = %s" % (key, dic[key]))

这样做会打印出你字典里的所有键和值。

回答于 2025-04-15 由 Python大师

分享举报

你所提到的内容叫做“倒排索引”——也就是说，独特的项目只记录一次，并且会有一个键的列表。

>>> from collections import defaultdict
>>> a = {"2323232838": ("first/dir", "hello.txt"),
...      "2323221383": ("second/dir", "foo.txt"),
...      "3434221": ("first/dir", "hello.txt"),
...      "32232334": ("first/dir", "hello.txt"),
...      "324234324": ("third/dir", "dog.txt")}
>>> invert = defaultdict( list )
>>> for key, value in a.items():
...     invert[value].append( key )
... 
>>> invert
defaultdict(<type 'list'>, {('first/dir', 'hello.txt'): ['3434221', '2323232838', '32232334'], ('second/dir', 'foo.txt'): ['2323221383'], ('third/dir', 'dog.txt'): ['324234324']})

倒排字典将原始值和一个或多个键的列表关联在一起。

现在，我们来看看如何从中得到你想要的修订字典。

过滤：

>>> [ invert[multi] for multi in invert if len(invert[multi]) > 1 ]
[['3434221', '2323232838', '32232334']]
>>> [ invert[uni] for uni in invert if len(invert[uni]) == 1 ]
[['2323221383'], ['324234324']]

扩展：

>>> [ (i,multi) for multi in invert if len(invert[multi]) > 1 for i in invert[multi] ]
[('3434221', ('first/dir', 'hello.txt')), ('2323232838', ('first/dir', 'hello.txt')), ('32232334', ('first/dir', 'hello.txt'))]
>>> dict( (i,multi) for multi in invert if len(invert[multi]) > 1 for i in invert[multi] )
{'3434221': ('first/dir', 'hello.txt'), '2323232838': ('first/dir', 'hello.txt'), '32232334': ('first/dir', 'hello.txt')}

对于那些只出现一次的项目，类似（但更简单）的处理方法也适用。

回答于 2025-04-15 由 Python大师

分享举报

下面的代码会生成两个变量，matches 和 remainders。matches 是一个字典数组，其中包含了原始字典中匹配的项目，每个匹配的项目都会有一个对应的元素。而 remainders 则会包含所有没有匹配上的项目，就像你例子中的那样，它是一个字典。

注意，在你的例子中，只有一组匹配的值：('first/dir', 'hello.txt')。如果有多组匹配的值，每组都会在 matches 中有一个对应的条目。

import itertools

# Original dict
a = {"2323232838": ("first/dir", "hello.txt"),
     "2323221383": ("second/dir", "foo.txt"),
     "3434221": ("first/dir", "hello.txt"),
     "32232334": ("first/dir", "hello.txt"),
     "324234324": ("third/dir", "dog.txt")}

# Convert dict to sorted list of items
a = sorted(a.items(), key=lambda x:x[1])

# Group by value of tuple
groups = itertools.groupby(a, key=lambda x:x[1])

# Pull out matching groups of items, and combine items   
# with no matches back into a single dictionary
remainder = []
matched   = []

for key, group in groups:
   group = list(group)
   if len(group) == 1:
      remainder.append( group[0] )
   else:
      matched.append( dict(group) )
else:
   remainder = dict(remainder)

输出结果：

>>> matched
[
  {
    '3434221':    ('first/dir', 'hello.txt'), 
    '2323232838': ('first/dir', 'hello.txt'), 
    '32232334':   ('first/dir', 'hello.txt')
  }
]

>>> remainder
{
  '2323221383': ('second/dir', 'foo.txt'), 
  '324234324':  ('third/dir', 'dog.txt')
}

作为新手，你可能会在上面的代码中遇到一些不太熟悉的概念。这里有一些链接可以帮助你理解：

回答于 2025-04-15 由 Python大师

分享举报

如何按值过滤字典？

5 个回答

撰写回答