多进程:如何在多个进程中共享字典?
这个程序会创建几个小程序(我们称之为进程),它们会在一个可以加入的队列 Q
上工作,最后可能会操作一个全局字典 D
来存储结果。也就是说,每个小程序都可以使用 D
来保存自己的结果,还能看到其他小程序产生的结果。
当我在一个小程序中打印字典 D
时,我能看到对它所做的修改(也就是 D
的内容)。但是,当主程序处理完队列 Q
后,如果我打印 D
,它却是一个空字典!
我明白这可能是同步或锁的问题。有人能告诉我这里发生了什么,以及我该如何同步对 D
的访问吗?
6 个回答
41
多进程和多线程是两回事。每个子进程都会得到主进程内存的一个副本。通常情况下,状态是通过通信(比如管道或套接字)、信号或者共享内存来共享的。
多进程为你的使用场景提供了一些抽象工具——共享的状态可以通过代理或共享内存被当作本地状态来使用:http://docs.python.org/library/multiprocessing.html#sharing-state-between-processes
相关部分:
42
除了@senderle提到的内容,一些人可能也在想怎么使用multiprocessing.Pool
这个功能。
好消息是,manager
实例有一个.Pool()
的方法,这个方法的用法和顶层的multiprocessing
非常相似,大家都很熟悉。
from itertools import repeat
import multiprocessing as mp
import os
import pprint
def f(d: dict) -> None:
pid = os.getpid()
d[pid] = f"Hi, I was written by process {pid:d}"
if __name__ == '__main__':
with mp.Manager() as manager:
d = manager.dict()
with manager.Pool() as pool:
pool.map(f, repeat(d, 10))
# `d` is a DictProxy object that can be converted to dict
pprint.pprint(dict(d))
输出结果:
$ python3 mul.py
{22562: 'Hi, I was written by process 22562',
22563: 'Hi, I was written by process 22563',
22564: 'Hi, I was written by process 22564',
22565: 'Hi, I was written by process 22565',
22566: 'Hi, I was written by process 22566',
22567: 'Hi, I was written by process 22567',
22568: 'Hi, I was written by process 22568',
22569: 'Hi, I was written by process 22569',
22570: 'Hi, I was written by process 22570',
22571: 'Hi, I was written by process 22571'}
这是一个稍微不同的例子,每个进程只是把它的进程ID记录到全局的DictProxy
对象d
中。
247
一个通用的解决方案是使用一个叫做 Manager
的对象。这个内容是从文档中改编过来的:
from multiprocessing import Process, Manager
def f(d):
d[1] += '1'
d['2'] += 2
if __name__ == '__main__':
manager = Manager()
d = manager.dict()
d[1] = '1'
d['2'] = 2
p1 = Process(target=f, args=(d,))
p2 = Process(target=f, args=(d,))
p1.start()
p2.start()
p1.join()
p2.join()
print d
输出结果:
$ python mul.py
{1: '111', '2': 6}