通过Python multiprocessing.Pool修改对象:奇怪的行为
我有一个对象,这个对象有两个属性:一个是字典(dict),另一个是整数(int)。当我通过多进程的方式使用 multiprocessing.Pool 来修改这个对象时,我发现返回的对象中,整数属性被修改了,但字典却没有变化。这是为什么呢?
from multiprocessing import Pool
def fork():
someObject = SomeClass()
for i in range(10):
someObject.method(i)
print("in fork, someObject has dct=%s and nbr=%i" % (someObject.dct, someObject.nbr))
return someObject
def test():
pool = Pool(processes=1)
result = pool.apply(func=fork)
print("in main, someObject has dct=%s and nbr=%i" % (result.dct, result.nbr))
class SomeClass(object):
dct = {}
nbr = 0
def method(self, nbr):
self.dct[nbr]=nbr
self.nbr+=nbr
if __name__=='__main__':
test()
输出结果:
在子进程中,someObject 的字典是 dct={0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9},整数是 nbr=45
在主进程中,someObject 的字典是 dct={},整数是 nbr=45
2 个回答
1
我找到了一种替代的解决方案。与其使用 dict()
,我用了 multiprocessing.Manager.dict()
,结果效果很好。
2
父进程和子进程有各自不同的 SomeClass.dct
和 SomeClass.nbt
的副本。
之所以 nbr
被更新了而 dct
没有,是因为当你执行 self.nbr+=nbr
时,nbr
实际上变成了一个实例变量,这个变量会被序列化(也就是“打包”)并发送回父进程。但是你从来没有把 self.dct
赋值给任何东西,所以 self.dct
(实际上指的是 SomeClass.dct
)就没有被序列化。
你可以通过在 SomeClass
中定义一个 __getstate__()
方法来查看这一点:
class SomeClass(object):
dct = {}
nbr = 0
def method(self, nbr):
self.dct[nbr]=nbr
self.nbr+=nbr
def __getstate__(self):
res = self.__dict__
print("pickled", res)
return res
这会打印出:
in fork, someObject has dct={0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9} and nbr=45
('pickled', {'nbr': 45})
in main, someObject has dct={} and nbr=45
你可以通过把 dct
赋值给“它自己”来强制序列化 dct
:
class SomeClass(object):
dct = {}
nbr = 0
def method(self, nbr):
self.dct[nbr]=nbr
self.dct = self.dct
self.nbr+=nbr
这会打印出:
in fork, someObject has dct={0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9} and nbr=45
('pickled', {'nbr': 45, 'dct': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9}})
in main, someObject has dct={0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9} and nbr=45