如何对Python新风格类实例进行哈希处理？

Question

给定一个自定义的新式 Python 类实例，怎样才能为它生成一个唯一的 ID 值，以便用于各种用途呢？可以想象成对一个类实例进行 md5sum 或 sha1sum 的操作。

我现在使用的方法是将这个类进行序列化（也就是把它变成一个可以存储的格式），然后通过 hexdigest 处理这个序列化后的数据，把得到的哈希字符串存储到类的一个属性里（这个属性在序列化和反序列化的过程中是不会被处理的，顺便说一下）。不过现在我遇到了一个问题，一个第三方模块使用了嵌套类，而没有什么好的方法可以对这些嵌套类进行序列化，除非用一些小技巧。我觉得我可能错过了某个聪明的 Python 技巧来解决这个问题。

编辑：

这里有个示例代码，因为在这里提问似乎需要提供代码才能引起关注。下面这个类可以正常初始化，并且 self._uniq_id 属性也可以正确设置。

#!/usr/bin/env python

import hashlib

# cPickle or pickle.
try:
   import cPickle as pickle
except:
   import pickle
# END try

# Single class, pickles fine.
class FooBar(object):
    __slots__ = ("_foo", "_bar", "_uniq_id")

    def __init__(self, eth=None, ts=None, pkt=None):
        self._foo = "bar"
        self._bar = "bar"
        self._uniq_id = hashlib.sha1(pickle.dumps(self, -1)).hexdigest()[0:16]

    def __getstate__(self):
        return {'foo':self._foo, 'bar':self._bar}

    def __setstate__(self, state):
        self._foo = state['foo']
        self._bar = state['bar']
        self._uniq_id = hashlib.sha1(pickle.dumps(self, -1)).hexdigest()[0:16]

    def _get_foo(self): return self._foo
    def _get_bar(self): return self._bar
    def _get_uniq_id(self): return self._uniq_id

    foo = property(_get_foo)
    bar = property(_get_bar)
    uniq_id = property(_get_uniq_id)
# End

然而，下一个类却无法初始化，因为 Bar 嵌套在 Foo 里面：

#!/usr/bin/env python

import hashlib

# cPickle or pickle.
try:
   import cPickle as pickle
except:
   import pickle
# END try

# Nested class, can't pickle for hexdigest.
class Foo(object):
    __slots__ = ("_foo", "_bar", "_uniq_id")

    class Bar(object):
        pass

    def __init__(self, eth=None, ts=None, pkt=None):
        self._foo = "bar"
        self._bar = self.Bar()
        self._uniq_id = hashlib.sha1(pickle.dumps(self, -1)).hexdigest()[0:16]

    def __getstate__(self):
        return {'foo':self._foo, 'bar':self._bar}

    def __setstate__(self, state):
        self._foo = state['foo']
        self._bar = state['bar']
        self._uniq_id = hashlib.sha1(pickle.dumps(self, -1)).hexdigest()[0:16]

    def _get_foo(self): return self._foo
    def _get_bar(self): return self._bar
    def _get_uniq_id(self): return self._uniq_id

    foo = property(_get_foo)
    bar = property(_get_bar)
    uniq_id = property(_get_uniq_id)
# End

我收到的错误是：

Traceback (most recent call last):
  File "./nest_test.py", line 70, in <module>
    foobar2 = Foo()
  File "./nest_test.py", line 49, in __init__
    self._uniq_id = hashlib.sha1(pickle.dumps(self, -1)).hexdigest()[0:16]
cPickle.PicklingError: Can't pickle <class '__main__.Bar'>: attribute lookup __main__.Bar failed

(nest_test.py 文件里有这两个类，因此行号会有偏差)。

我发现序列化需要 __getstate__() 方法，所以我也实现了 __setstate__() 方法以确保完整性。但是，考虑到关于安全性和序列化的警告，肯定还有更好的方法来处理这个问题。

根据我目前的了解，这个错误是因为 Python 无法解析嵌套类。它试图查找属性 __main__.Bar，但这个属性并不存在。它实际上需要找到 __main__.Foo.Bar，但没有什么好的方法可以做到这一点。我在另一个 StackOverflow 的回答中看到一个“技巧”来欺骗 Python，但它附带了一个严厉的警告，表示这种方法不建议使用，建议要么使用其他方法而不是序列化，要么把嵌套类的定义移到外面。

不过，我认为那个 StackOverflow 回答的原始问题是关于将对象序列化到文件中。我只需要序列化以使用所需的 hashlib 函数，这些函数似乎是针对字节数组操作的（就像我在 .NET 中习惯的那样），而序列化（尤其是 cPickle）相较于自己编写字节数组的处理方式要快且优化得多。

安全性序列化类实例数据完整性唯一ID 哈希处理嵌套类技巧

如何对Python新风格类实例进行哈希处理？

2 个回答

撰写回答