两个字典的递归差异(键和值)?

49 投票
10 回答
54204 浏览
提问于 2025-04-16 17:04

我有一个Python字典,叫做 d1,还有一个在稍后时间的版本,叫做 d2。我想找出 d1d2 之间的所有变化。换句话说,就是找出所有新增的、删除的或修改的内容。问题在于,这些值可以是整数、字符串、列表或字典,所以需要递归处理。这是我目前的代码:

def dd(d1, d2, ctx=""):
    print "Changes in " + ctx
    for k in d1:
        if k not in d2:
            print k + " removed from d2"
    for k in d2:
        if k not in d1:
            print k + " added in d2"
            continue
        if d2[k] != d1[k]:
            if type(d2[k]) not in (dict, list):
                print k + " changed in d2 to " + str(d2[k])
            else:
                if type(d1[k]) != type(d2[k]):
                    print k + " changed to " + str(d2[k])
                    continue
                else:
                    if type(d2[k]) == dict:
                        dd(d1[k], d2[k], k)
                        continue
    print "Done with changes in " + ctx
    return

这段代码运行得很好,除了当值是列表的时候。我还没想到一个优雅的办法来处理列表,否则就得在 if(type(d2) == list) 后面重复写一大段稍微改动过的代码。

有什么想法吗?

补充说明:这和这篇帖子不同,因为这里的键也可能会改变。

10 个回答

22

这是一个受到 Winston Ewert 启发的实现方式。

def recursive_compare(d1, d2, level='root'):
    if isinstance(d1, dict) and isinstance(d2, dict):
        if d1.keys() != d2.keys():
            s1 = set(d1.keys())
            s2 = set(d2.keys())
            print('{:<20} + {} - {}'.format(level, s1-s2, s2-s1))
            common_keys = s1 & s2
        else:
            common_keys = set(d1.keys())

        for k in common_keys:
            recursive_compare(d1[k], d2[k], level='{}.{}'.format(level, k))

    elif isinstance(d1, list) and isinstance(d2, list):
        if len(d1) != len(d2):
            print('{:<20} len1={}; len2={}'.format(level, len(d1), len(d2)))
        common_len = min(len(d1), len(d2))

        for i in range(common_len):
            recursive_compare(d1[i], d2[i], level='{}[{}]'.format(level, i))

    else:
        if d1 != d2:
            print('{:<20} {} != {}'.format(level, d1, d2))

if __name__ == '__main__':
    d1={'a':[0,2,3,8], 'b':0, 'd':{'da':7, 'db':[99,88]}}
    d2={'a':[0,2,4], 'c':0, 'd':{'da':3, 'db':7}}

    recursive_compare(d1, d2)

将会返回:

root                 + {'b'} - {'c'}
root.a               len1=4; len2=3
root.a[2]            3 != 4
root.d.db            [99, 88] != 7
root.d.da            7 != 3
77

如果你想要递归地找出差异,我写了一个Python的包:https://github.com/seperman/deepdiff

安装

可以从PyPi安装:

pip install deepdiff

使用示例

导入

>>> from deepdiff import DeepDiff
>>> from pprint import pprint
>>> from __future__ import print_function # In case running on Python 2

相同的对象返回空

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = t1
>>> print(DeepDiff(t1, t2))
{}

某个项目的类型发生了变化

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:"2", 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{ 'type_changes': { 'root[2]': { 'newtype': <class 'str'>,
                                 'newvalue': '2',
                                 'oldtype': <class 'int'>,
                                 'oldvalue': 2}}}

某个项目的值发生了变化

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:4, 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}

项目被添加和/或移除

>>> t1 = {1:1, 2:2, 3:3, 4:4}
>>> t2 = {1:1, 2:4, 3:3, 5:5, 6:6}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff)
{'dic_item_added': ['root[5]', 'root[6]'],
 'dic_item_removed': ['root[4]'],
 'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}

字符串差异

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world"}}
>>> t2 = {1:1, 2:4, 3:3, 4:{"a":"hello", "b":"world!"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': { 'root[2]': {'newvalue': 4, 'oldvalue': 2},
                      "root[4]['b']": { 'newvalue': 'world!',
                                        'oldvalue': 'world'}}}

字符串差异 2

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world!\nGoodbye!\n1\n2\nEnd"}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n1\n2\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': { "root[4]['b']": { 'diff': '--- \n'
                                                '+++ \n'
                                                '@@ -1,5 +1,4 @@\n'
                                                '-world!\n'
                                                '-Goodbye!\n'
                                                '+world\n'
                                                ' 1\n'
                                                ' 2\n'
                                                ' End',
                                        'newvalue': 'world\n1\n2\nEnd',
                                        'oldvalue': 'world!\n'
                                                    'Goodbye!\n'
                                                    '1\n'
                                                    '2\n'
                                                    'End'}}}

>>> 
>>> print (ddiff['values_changed']["root[4]['b']"]["diff"])
--- 
+++ 
@@ -1,5 +1,4 @@
-world!
-Goodbye!
+world
 1
 2
 End

类型变化

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n\n\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'type_changes': { "root[4]['b']": { 'newtype': <class 'str'>,
                                      'newvalue': 'world\n\n\nEnd',
                                      'oldtype': <class 'list'>,
                                      'oldvalue': [1, 2, 3]}}}

列表差异

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3, 4]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{'iterable_item_removed': {"root[4]['b'][2]": 3, "root[4]['b'][3]": 4}}

列表差异 2:

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'iterable_item_added': {"root[4]['b'][3]": 3},
  'values_changed': { "root[4]['b'][1]": {'newvalue': 3, 'oldvalue': 2},
                      "root[4]['b'][2]": {'newvalue': 2, 'oldvalue': 3}}}

忽略顺序或重复的列表差异:(使用上面相同的字典)

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2, ignore_order=True)
>>> print (ddiff)
{}

包含字典的列表:

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:1, 2:2}]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:3}]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'dic_item_removed': ["root[4]['b'][2][2]"],
  'values_changed': {"root[4]['b'][2][1]": {'newvalue': 3, 'oldvalue': 1}}}

集合:

>>> t1 = {1, 2, 8}
>>> t2 = {1, 2, 3, 5}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (DeepDiff(t1, t2))
{'set_item_added': ['root[3]', 'root[5]'], 'set_item_removed': ['root[8]']}

命名元组:

>>> from collections import namedtuple
>>> Point = namedtuple('Point', ['x', 'y'])
>>> t1 = Point(x=11, y=22)
>>> t2 = Point(x=11, y=23)
>>> pprint (DeepDiff(t1, t2))
{'values_changed': {'root.y': {'newvalue': 23, 'oldvalue': 22}}}

自定义对象:

>>> class ClassA(object):
...     a = 1
...     def __init__(self, b):
...         self.b = b
... 
>>> t1 = ClassA(1)
>>> t2 = ClassA(2)
>>> 
>>> pprint(DeepDiff(t1, t2))
{'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}

对象属性被添加:

>>> t2.c = "new attribute"
>>> pprint(DeepDiff(t1, t2))
{'attribute_added': ['root.c'],
 'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}
10

一种选择是把你用到的列表转换成字典,使用索引作为键。比如:

# add this function to the same module
def list_to_dict(l):
    return dict(zip(map(str, range(len(l))), l))

# add this code under the 'if type(d2[k]) == dict' block
                    elif type(d2[k]) == list:
                        dd(list_to_dict(d1[k]), list_to_dict(d2[k]), k)

这里是你在评论中给出的示例字典的输出结果:

>>> d1 = {"name":"Joe", "Pets":[{"name":"spot", "species":"dog"}]}
>>> d2 = {"name":"Joe", "Pets":[{"name":"spot", "species":"cat"}]}
>>> dd(d1, d2, "base")
Changes in base
Changes in Pets
Changes in 0
species changed in d2 to cat
Done with changes in 0
Done with changes in Pets
Done with changes in base

请注意,这样做是逐个比较索引的,所以如果列表中的项目有添加或删除,就需要做一些修改才能正常工作。

撰写回答