Python可以序列化lambda函数吗?
我在很多讨论中看到,Python的 pickle
/cPickle
不能处理lambda函数。不过,下面这段代码在Python 2.7.6中却能正常运行:
import cPickle as pickle
if __name__ == "__main__":
s = pickle.dumps(lambda x, y: x+y)
f = pickle.loads(s)
assert f(3,4) == 7
那么这是怎么回事呢?或者说,处理lambda函数时有什么限制呢?
[编辑] 我想我知道这段代码为什么能运行了。我忘了(抱歉!)我是在使用无栈Python,它有一种叫做tasklets的微线程来执行函数。这些tasklets可以被暂停、序列化(也就是pickled)、反序列化(unpickled)后继续执行,所以我猜(我在无栈的邮件列表上问过)它也提供了一种序列化函数体的方法。
6 个回答
对我来说(在Windows 10和Python 3.7上),有效的方法是传递一个普通的函数,而不是使用lambda函数:
def merge(x):
return Image.merge("RGB", x.split()[::-1])
transforms.Lambda(merge)
而不是:
transforms.Lambda(lambda x: Image.merge("RGB", x.split()[::-1]))
不需要用到dill或cPickle。
虽然这可能很明显,但我想再提供一个可能的解决方案。你可能知道,lambda函数就是一种没有名字的函数声明。如果你用的lambda函数不多,而且只用一次,这样不会让你的代码显得杂乱,你可以给你的lambda函数起个名字,然后像这样传递它的名字(不加括号):
import cPickle as pickle
def addition(x, y):
return x+y
if __name__ == "__main__":
s = pickle.dumps(addition)
f = pickle.loads(s)
assert f(3,4) == 7
给函数起名字也能让代码更容易理解,而且你就不需要像Dill这样的额外依赖。不过,只有在这样做的好处大于增加的代码复杂度时,才这样做。
不,Python不能对lambda函数进行序列化:
>>> import cPickle as pickle
>>> s = pickle.dumps(lambda x,y: x+y)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy_reg.py", line 70, in _reduce_ex
raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle function objects
不太清楚你做了什么能成功...
Python可以对lambda函数进行序列化。我们将分别讨论Python 2和3,因为这两个版本的序列化实现方式不同。
- Python 3.6
在Python 3中,没有叫做cPickle
的模块。我们有pickle
,但它默认不支持对lambda
函数的序列化。让我们看看它的调度表:
>> import pickle
>> pickle.Pickler.dispatch_table
<member 'dispatch_table' of '_pickle.Pickler' objects>
等等。我试着查找pickle
的dispatch_table,而不是_pickle
。_pickle
是pickle的一个更快的C语言实现。但我们还没有导入它!如果可用,这个C实现会在纯Python的pickle
模块结束时自动导入。
# Use the faster _pickle if possible
try:
from _pickle import (
PickleError,
PicklingError,
UnpicklingError,
Pickler,
Unpickler,
dump,
dumps,
load,
loads
)
except ImportError:
Pickler, Unpickler = _Pickler, _Unpickler
dump, dumps, load, loads = _dump, _dumps, _load, _loads
我们仍然面临在Python 3中序列化lambda函数的问题。答案是你不能使用原生的pickle
或_pickle
。你需要导入dill
或cloudpickle,并使用它们来代替原生的pickle模块。
>> import dill
>> dill.loads(dill.dumps(lambda x:x))
<function __main__.<lambda>>
- Python 2.7
pickle
使用pickle注册表,这实际上就是一个类型到用于序列化(即“打包”)该类型对象的函数的映射。你可以把pickle注册表看作:
>> pickle.Pickler.dispatch
{bool: <function pickle.save_bool>,
instance: <function pickle.save_inst>,
classobj: <function pickle.save_global>,
float: <function pickle.save_float>,
function: <function pickle.save_global>,
int: <function pickle.save_int>,
list: <function pickle.save_list>,
long: <function pickle.save_long>,
dict: <function pickle.save_dict>,
builtin_function_or_method: <function pickle.save_global>,
NoneType: <function pickle.save_none>,
str: <function pickle.save_string>,
tuple: <function pickle.save_tuple>,
type: <function pickle.save_global>,
unicode: <function pickle.save_unicode>}
为了序列化自定义类型,Python提供了copy_reg
模块来注册我们的函数。你可以在这里了解更多信息。默认情况下,copy_reg
模块支持序列化以下额外类型:
>> import copy_reg
>> copy_reg.dispatch_table
{code: <function ipykernel.codeutil.reduce_code>,
complex: <function copy_reg.pickle_complex>,
_sre.SRE_Pattern: <function re._pickle>,
posix.statvfs_result: <function os._pickle_statvfs_result>,
posix.stat_result: <function os._pickle_stat_result>}
现在,lambda
函数的类型是types.FunctionType
。然而,这种类型的内置函数function: <function pickle.save_global>
无法序列化lambda函数。因此,所有第三方库,比如dill
、cloudpickle
等,都会重写内置方法,以一些额外的逻辑来序列化lambda函数。让我们导入dill
,看看它是怎么做的。
>> import dill
>> pickle.Pickler.dispatch
{_pyio.BufferedReader: <function dill.dill.save_file>,
_pyio.TextIOWrapper: <function dill.dill.save_file>,
_pyio.BufferedWriter: <function dill.dill.save_file>,
_pyio.BufferedRandom: <function dill.dill.save_file>,
functools.partial: <function dill.dill.save_functor>,
operator.attrgetter: <function dill.dill.save_attrgetter>,
operator.itemgetter: <function dill.dill.save_itemgetter>,
cStringIO.StringI: <function dill.dill.save_stringi>,
cStringIO.StringO: <function dill.dill.save_stringo>,
bool: <function pickle.save_bool>,
cell: <function dill.dill.save_cell>,
instancemethod: <function dill.dill.save_instancemethod0>,
instance: <function pickle.save_inst>,
classobj: <function dill.dill.save_classobj>,
code: <function dill.dill.save_code>,
property: <function dill.dill.save_property>,
method-wrapper: <function dill.dill.save_instancemethod>,
dictproxy: <function dill.dill.save_dictproxy>,
wrapper_descriptor: <function dill.dill.save_wrapper_descriptor>,
getset_descriptor: <function dill.dill.save_wrapper_descriptor>,
member_descriptor: <function dill.dill.save_wrapper_descriptor>,
method_descriptor: <function dill.dill.save_wrapper_descriptor>,
file: <function dill.dill.save_file>,
float: <function pickle.save_float>,
staticmethod: <function dill.dill.save_classmethod>,
classmethod: <function dill.dill.save_classmethod>,
function: <function dill.dill.save_function>,
int: <function pickle.save_int>,
list: <function pickle.save_list>,
long: <function pickle.save_long>,
dict: <function dill.dill.save_module_dict>,
builtin_function_or_method: <function dill.dill.save_builtin_method>,
module: <function dill.dill.save_module>,
NotImplementedType: <function dill.dill.save_singleton>,
NoneType: <function pickle.save_none>,
xrange: <function dill.dill.save_singleton>,
slice: <function dill.dill.save_slice>,
ellipsis: <function dill.dill.save_singleton>,
str: <function pickle.save_string>,
tuple: <function pickle.save_tuple>,
super: <function dill.dill.save_functor>,
type: <function dill.dill.save_type>,
weakcallableproxy: <function dill.dill.save_weakproxy>,
weakproxy: <function dill.dill.save_weakproxy>,
weakref: <function dill.dill.save_weakref>,
unicode: <function pickle.save_unicode>,
thread.lock: <function dill.dill.save_lock>}
现在,让我们尝试序列化一个lambda函数。
>> pickle.loads(pickle.dumps(lambda x:x))
<function __main__.<lambda>>
成功了!!
在Python 2中,我们有两个版本的pickle
-
import pickle # pure Python version
pickle.__file__ # <install directory>/python-2.7/lib64/python2.7/pickle.py
import cPickle # C extension
cPickle.__file__ # <install directory>/python-2.7/lib64/python2.7/lib-dynload/cPickle.so
现在,让我们尝试用C实现的cPickle
来序列化lambda。
>> import cPickle
>> cPickle.loads(cPickle.dumps(lambda x:x))
TypeError: can't pickle function objects
出了什么问题?让我们看看cPickle
的调度表。
>> cPickle.Pickler.dispatch_table
AttributeError: 'builtin_function_or_method' object has no attribute 'dispatch_table'
pickle
和cPickle
的实现是不同的。导入dill
只会让Python版本的pickle
工作。使用pickle
而不是cPickle
的缺点是,它的速度可能比cPickle
慢1000倍。
希望这些能解答你的疑问。
是的,Python可以对lambda函数进行序列化(也就是“腌制”),但前提是你需要有一些东西来使用copy_reg
来注册如何对lambda函数进行序列化。这个叫dill
的包会在你import dill
copy_reg加载到序列化注册表中。
Python 2.7.8 (default, Jul 13 2014, 02:29:54)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> import dill # the code below will fail without this line
>>>
>>> import pickle
>>> s = pickle.dumps(lambda x, y: x+y)
>>> f = pickle.loads(s)
>>> assert f(3,4) == 7
>>> f
<function <lambda> at 0x10aebdaa0>
你可以在这里获取dill: https://github.com/uqfoundation