Python:无法序列化模块对象错误
我正在尝试把一个很大的类进行序列化(也就是把它转换成一种可以保存或传输的格式),但是遇到了一个错误:
类型错误:无法序列化模块对象
虽然我在网上查了很多资料,但还是不太明白这是什么意思。而且我也不确定是哪个模块对象
导致了这个问题。有没有办法找到出错的地方?因为错误信息的堆栈跟踪似乎没有给出任何提示。
5 个回答
递归查找Pickle失败
受到wump
评论的启发:Python: can't pickle module objects error
这里有一些简单的代码,帮助我递归地找到问题所在。
它会检查相关的对象,看看是否在进行Pickle时失败。
然后,它会尝试对__dict__
中的键进行Pickle,返回仅失败的Pickle列表。
代码片段
import pickle
def pickle_trick(obj, max_depth=10):
output = {}
if max_depth <= 0:
return output
try:
pickle.dumps(obj)
except (pickle.PicklingError, TypeError) as e:
failing_children = []
if hasattr(obj, "__dict__"):
for k, v in obj.__dict__.items():
result = pickle_trick(v, max_depth=max_depth - 1)
if result:
failing_children.append(result)
output = {
"fail": obj,
"err": e,
"depth": max_depth,
"failing_children": failing_children
}
return output
示例程序
import redis
import pickle
from pprint import pformat as pf
def pickle_trick(obj, max_depth=10):
output = {}
if max_depth <= 0:
return output
try:
pickle.dumps(obj)
except (pickle.PicklingError, TypeError) as e:
failing_children = []
if hasattr(obj, "__dict__"):
for k, v in obj.__dict__.items():
result = pickle_trick(v, max_depth=max_depth - 1)
if result:
failing_children.append(result)
output = {
"fail": obj,
"err": e,
"depth": max_depth,
"failing_children": failing_children
}
return output
if __name__ == "__main__":
r = redis.Redis()
print(pf(pickle_trick(r)))
示例输出
$ python3 pickle-trick.py
{'depth': 10,
'err': TypeError("can't pickle _thread.lock objects"),
'fail': Redis<ConnectionPool<Connection<host=localhost,port=6379,db=0>>>,
'failing_children': [{'depth': 9,
'err': TypeError("can't pickle _thread.lock objects"),
'fail': ConnectionPool<Connection<host=localhost,port=6379,db=0>>,
'failing_children': [{'depth': 8,
'err': TypeError("can't pickle _thread.lock objects"),
'fail': <unlocked _thread.lock object at 0x10bb58300>,
'failing_children': []},
{'depth': 8,
'err': TypeError("can't pickle _thread.RLock objects"),
'fail': <unlocked _thread.RLock object owner=0 count=0 at 0x10bb58150>,
'failing_children': []}]},
{'depth': 9,
'err': PicklingError("Can't pickle <function Redis.<lambda> at 0x10c1e8710>: attribute lookup Redis.<lambda> on redis.client failed"),
'fail': {'ACL CAT': <function Redis.<lambda> at 0x10c1e89e0>,
'ACL DELUSER': <class 'int'>,
0x10c1e8170>,
.........
'ZSCORE': <function float_or_none at 0x10c1e5d40>},
'failing_children': []}]}
根本原因 - Redis无法Pickle _thread.lock
在我的情况下,创建一个Redis
的实例并将其保存为对象的属性时,导致了Pickle失败。
当你创建一个Redis
实例时,它还会创建一个connection_pool
,里面有Threads
,而这些线程锁是无法进行Pickle的。
我必须在进行Pickle之前,在multiprocessing.Process
中创建和清理Redis
。
测试
在我的情况下,我尝试进行Pickle的类必须能够进行Pickle。因此,我添加了一个单元测试,创建该类的实例并进行Pickle。这样,如果有人修改了这个类,使其无法进行Pickle,从而破坏了它在多进程(和pyspark)中的使用能力,我们就能及时发现这个问题。
def test_can_pickle():
# Given
obj = MyClassThatMustPickle()
# When / Then
pkl = pickle.dumps(obj)
# This test will throw an error if it is no longer pickling correctly
Python不能对模块对象进行序列化(也就是不能“打包”模块对象)是个真正的问题。为什么会这样呢?我觉得没有什么好的理由。模块对象不能被序列化让Python在并行或异步编程时显得很脆弱。如果你想对模块对象,或者Python中的几乎任何东西进行序列化,可以使用 dill
。
Python 3.2.5 (default, May 19 2013, 14:25:55)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> import os
>>> dill.dumps(os)
b'\x80\x03cdill.dill\n_import_module\nq\x00X\x02\x00\x00\x00osq\x01\x85q\x02Rq\x03.'
>>>
>>>
>>> # and for parlor tricks...
>>> class Foo(object):
... x = 100
... def __call__(self, f):
... def bar(y):
... return f(self.x) + y
... return bar
...
>>> @Foo()
... def do_thing(x):
... return x
...
>>> do_thing(3)
103
>>> dill.loads(dill.dumps(do_thing))(3)
103
>>>
在这里获取 dill
: https://github.com/uqfoundation/dill
我可以通过这种方式重现错误信息:
import cPickle
class Foo(object):
def __init__(self):
self.mod=cPickle
foo=Foo()
with file('/tmp/test.out', 'w') as f:
cPickle.dump(foo, f)
# TypeError: can't pickle module objects
你有没有一个类属性是指向某个模块的?