比较两个列表并打印不匹配项的Pythonic方法？

3 投票

3 回答

5761 浏览

提问于 2025-04-16 03:03

我有两个Python的列表，里面装的是字典，分别叫做 entries9 和 entries10。我想比较这两个列表里的内容，把相同的部分放到一个新的列表里，叫做 joint_items。同时，我还想把没有匹配上的内容分别放到两个新的列表中，分别叫 unmatched_items_9 和 unmatched_items_10。

这是我的代码。得到 joint_items 和 unmatched_items_9（在外层列表里）其实很简单，但我该怎么得到 unmatched_items_10（在内层列表里）呢？

for counter, entry1 in enumerate(entries9):
    match_found = False
    for counter2,entry2 in enumerate(entries10):
        if match_found:
            continue
        if entry1[a]==entry2[a] and entry1[b]==entry2[b]: # the dictionaries only have some keys in common, but we care about a and b
            match_found = True
            joint_item = entry1
            joint_items.append(joint_item)
            #entries10.remove(entry2) # Tried this originally, but realised it messes with the original list object!
    if match_found:
        continue
    else: 
        unmatched_items_9.append(entry1)

性能方面其实没什么问题，因为这只是一个一次性的脚本。

代码优化字典操作数据处理脚本编写集合运算列表比较

3 个回答

Python的标准库里有一个叫做difflib.SequenceMatcher的类，它看起来可以满足你的需求，不过我不太清楚怎么使用它！

回答于 2025-04-16 由 Python大师

分享举报

你可以考虑使用 sets（集合）和它的一些方法，比如 intersection（交集）。不过，你需要把你的字典转换成不可变的数据，这样才能把它们放进 set（集合）里（比如转换成 string 字符串）。这样做行不行呢？

a = set(str(x) for x in entries9)
b = set(str(x) for x in entries10)  

# You'll have to change the above lines if you only care about _some_ of the keys

joint_items = a.union(b)
unmatched_items = a - b

# Now you can turn them back into dicts:
joint_items     = [eval(i) for i in joint_items]
unmatched_items = [eval(i) for i in unmatched_items]

回答于 2025-04-16 由 Python大师

分享举报

你现在做的事情，反过来做的话，可以这样写：

unmatched_items_10 = [d for d in entries10 if d not in entries9]

虽然这个写法比你现在的方式更简洁，但它有同样的性能问题：处理时间和每个列表中的项目数量成正比。如果你关注的列表长度大约是9或10（看起来是这个意思），那就没问题。

但是如果列表很长，你可以通过先对列表进行排序，然后“并行”处理它们来获得更好的性能（处理时间和较长列表的长度成正比，具体是 N log N，其中 N 是较长列表的长度）。还有其他方法可以尝试（会越来越复杂；-），如果这个更高级的方法仍然无法满足你的性能需求的话。如果你确实需要更好的性能，请告诉我每个列表的典型长度以及字典的典型内容，因为这些“细节”是选择在速度和简单性之间取得良好平衡的算法时的关键考虑因素。

编辑：提问者更新了问题，说明他关心的不是两个字典 d1 和 d2 是否相等（这就是 in 操作符检查的内容），而是 d1[a]==d2[a] 和 d1[b]==d2[b]。在这种情况下，in 操作符就不能用了（当然可以用一些复杂的方式绕过，但那样会增加复杂性，尽量避免；-），不过可以用 all 这个内置函数轻松替代：

unmatched_items_10 = [d for d in entries10
                      if all(d[a]!=d1[a] or d[b]!=d2[b] for d2 in entries9)]

我把逻辑反过来了（用 != 和 or，根据德摩根定律），因为我们想要的是那些不匹配的字典。不过，如果你更喜欢的话：

unmatched_items_10 = [d for d in entries10
                      if not any(d[a]==d1[a] and d[b]==d2[b] for d2 in entries9)]

个人来说，我不太喜欢 if not any 和 if not all，出于风格原因，但数学上是没问题的（根据维基百科页面提到的德摩根定律的扩展，因为 any 是存在量词，而 all 是全称量词，可以这么理解；-）。性能应该差不多（不过提问者在评论中澄清说，这个任务的性能对他们来说并不是很重要）。

回答于 2025-04-16 由 Python大师

分享举报

比较两个列表并打印不匹配项的Pythonic方法？

3 个回答

撰写回答