多处理中的管理器dict

2024-06-17 15:07:41 发布

您现在位置:Python中文网/ 问答频道 /正文

下面是一个简单的多处理代码:

from multiprocessing import Process, Manager

manager = Manager()
d = manager.dict()

def f():
    d[1].append(4)
    print d

if __name__ == '__main__':
    d[1] = []
    p = Process(target=f)
    p.start()
    p.join()

我得到的结果是:

{1: []}

我为什么不把{1: [4]}作为输出呢


Tags: 代码namefromimporttargetifmaindef
3条回答

以下是你写的:

# from here code executes in main process and all child processes
# every process makes all these imports
from multiprocessing import Process, Manager

# every process creates own 'manager' and 'd'
manager = Manager() 
# BTW, Manager is also child process, and 
# in its initialization it creates new Manager, and new Manager
# creates new and new and new
# Did you checked how many python processes were in your system? - a lot!
d = manager.dict()

def f():
    # 'd' - is that 'd', that is defined in globals in this, current process 
    d[1].append(4)
    print d

if __name__ == '__main__':
# from here code executes ONLY in main process 
    d[1] = []
    p = Process(target=f)
    p.start()
    p.join()

以下是您应该写的内容:

from multiprocessing import Process, Manager
def f(d):
    d[1] = d[1] + [4]
    print d

if __name__ == '__main__':
    manager = Manager() # create only 1 mgr
    d = manager.dict() # create only 1 dict
    d[1] = []
    p = Process(target=f,args=(d,)) # say to 'f', in which 'd' it should append
    p.start()
    p.join()

我认为这是manager代理调用中的一个错误。您可以避免使用共享列表的调用方法,如:

from multiprocessing import Process, Manager

manager = Manager()
d = manager.dict()

def f():
    # get the shared list
    shared_list = d[1]

    shared_list.append(4)

    # forces the shared list to 
    # be serialized back to manager
    d[1] = shared_list

    print d

if __name__ == '__main__':
    d[1] = []
    p = Process(target=f)
    p.start()
    p.join()

    print d

附加到d[1]的新项目未打印的原因在Python's official documentation中说明:

Modifications to mutable values or items in dict and list proxies will not be propagated through the manager, because the proxy has no way of knowing when its values or items are modified. To modify such an item, you can re-assign the modified object to the container proxy.

因此,实际情况就是这样:

from multiprocessing import Process, Manager

manager = Manager()
d = manager.dict()

def f():
    # invoke d.__getitem__(), returning a local copy of the empty list assigned by the main process,
    # (consider that a KeyError exception wasn't raised, so a list was definitely returned),
    # and append 4 to it, however this change is not propagated through the manager,
    # as it's performed on an ordinary list with which the manager has no interaction
    d[1].append(4)
    # convert d to string via d.__str__() (see https://docs.python.org/2/reference/datamodel.html#object.__str__),
    # returning the "remote" string representation of the object (see https://docs.python.org/2/library/multiprocessing.html#multiprocessing.managers.SyncManager.list),
    # to which the change above was not propagated
    print d

if __name__ == '__main__':
    # invoke d.__setitem__(), propagating this assignment (mapping 1 to an empty list) through the manager
    d[1] = []
    p = Process(target=f)
    p.start()
    p.join()

更新后,使用新列表重新分配d[1],甚至再次使用相同的列表,会触发管理器传播更改:

from multiprocessing import Process, Manager

manager = Manager()
d = manager.dict()

def f():
    # perform the exact same steps, as explained in the comments to the previous code snippet above,
    # but in addition, invoke d.__setitem__() with the changed item in order to propagate the change
    l = d[1]
    l.append(4)
    d[1] = l
    print d

if __name__ == '__main__':
    d[1] = []
    p = Process(target=f)
    p.start()
    p.join()

d[1] += [4]也会起作用


为Python 3.6或更高版本编辑:

Since Python 3.6,perthis changesetthis issue之后,还可以use nested Proxy Objects自动将对它们执行的任何更改传播到包含的代理对象。因此,将d[1] = []行替换为d[1] = manager.list()也可以纠正这个问题:

from multiprocessing import Process, Manager

manager = Manager()
d = manager.dict()

def f():
    d[1].append(4)
    # the __str__() method of a dict object invokes __repr__() on each of its items,
    # so explicitly invoking __str__() is required in order to print the actual list items
    print({k: str(v) for k, v in d.items()}

if __name__ == '__main__':
    d[1] = manager.list()
    p = Process(target=f)
    p.start()
    p.join()

不幸的是,这个bug修复没有移植到Python 2.7(从Python 2.7.13开始)


注意(在Windows操作系统下运行):

尽管所描述的行为也适用于Windows操作系统,但由于依赖于不受支持的^{} API rather than the ^{} system calldifferent process creation mechanism,在Windows下执行附加的代码段时会失败

每当通过多处理模块创建新进程时,Windows就会创建一个新的Python解释器进程,该进程导入主模块,具有潜在的危险副作用。为了避免这个问题,以下方案编制准则是recommended

Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process).

因此,在Windows下执行附加的代码片段会由于manager = Manager()行而试图创建无限多的进程。通过在if __name__ == '__main__'子句中创建ManagerManager.dict对象,并将Manager.dict对象作为参数传递给f(),可以很容易地解决这个问题,就像在this answer中所做的那样

有关这一问题的更多详情,请参阅this answer

相关问题 更多 >