共享只读数据在多进程中会被复制到不同进程吗？

71 投票

5 回答

53802 浏览

提问于 2025-04-16 15:06

我有一段代码，看起来大概是这样的：

glbl_array = # a 3 Gb array

def my_func( args, def_param = glbl_array):
    #do stuff on args and def_param

if __name__ == '__main__':
  pool = Pool(processes=4)
  pool.map(my_func, range(1000))

有没有办法确保（或者说鼓励）不同的进程不去复制这个 glbl_array，而是共享它？如果没有办法避免复制的话，我会考虑使用内存映射数组，但我的访问模式不是很规律，所以我觉得内存映射数组可能会慢一些。上面的做法看起来是我可以尝试的第一步。这是在Linux系统上。我只是想从Stackoverflow上得到一些建议，不想打扰系统管理员。你觉得如果第二个参数是一个真正不可变的对象，比如 glbl_array.tostring()，会有帮助吗？

进程间通信不可变对象多进程共享内存内存映射只读数据

5 个回答

对于那些在使用Windows的人来说，Windows不支持fork()（除非使用CygWin），所以pv的回答不适用。全局变量在子进程中是无法使用的。

因此，你必须在Pool/Process的初始化时传递共享内存，具体做法如下：

#! /usr/bin/python

import time

from multiprocessing import Process, Queue, Array

def f(q,a):
    m = q.get()
    print m
    print a[0], a[1], a[2]
    m = q.get()
    print m
    print a[0], a[1], a[2]

if __name__ == '__main__':
    a = Array('B', (1, 2, 3), lock=False)
    q = Queue()
    p = Process(target=f, args=(q,a))
    p.start()
    q.put([1, 2, 3])
    time.sleep(1)
    a[0:3] = (4, 5, 6)
    q.put([4, 5, 6])
    p.join()

（这不是numpy，也不是很好的代码，但它能说明问题；-）

回答于 2025-04-16 由 Python大师

分享举报

下面的代码在Win7和Mac上可以正常运行（可能在Linux上也可以，但没有测试过）。

import multiprocessing
import ctypes
import numpy as np

#-- edited 2015-05-01: the assert check below checks the wrong thing
#   with recent versions of Numpy/multiprocessing. That no copy is made
#   is indicated by the fact that the program prints the output shown below.
## No copy was made
##assert shared_array.base.base is shared_array_base.get_obj()

shared_array = None

def init(shared_array_base):
    global shared_array
    shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
    shared_array = shared_array.reshape(10, 10)

# Parallel processing
def my_func(i):
    shared_array[i, :] = i

if __name__ == '__main__':
    shared_array_base = multiprocessing.Array(ctypes.c_double, 10*10)

    pool = multiprocessing.Pool(processes=4, initializer=init, initargs=(shared_array_base,))
    pool.map(my_func, range(10))

    shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
    shared_array = shared_array.reshape(10, 10)
    print shared_array

回答于 2025-04-16 由 Python大师

分享举报

133

你可以很简单地把 multiprocessing 中的共享内存功能和 Numpy 一起使用：

import multiprocessing
import ctypes
import numpy as np

shared_array_base = multiprocessing.Array(ctypes.c_double, 10*10)
shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
shared_array = shared_array.reshape(10, 10)

#-- edited 2015-05-01: the assert check below checks the wrong thing
#   with recent versions of Numpy/multiprocessing. That no copy is made
#   is indicated by the fact that the program prints the output shown below.
## No copy was made
##assert shared_array.base.base is shared_array_base.get_obj()

# Parallel processing
def my_func(i, def_param=shared_array):
    shared_array[i,:] = i

if __name__ == '__main__':
    pool = multiprocessing.Pool(processes=4)
    pool.map(my_func, range(10))

    print shared_array

这段代码会输出：

[[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
 [ 2.  2.  2.  2.  2.  2.  2.  2.  2.  2.]
 [ 3.  3.  3.  3.  3.  3.  3.  3.  3.  3.]
 [ 4.  4.  4.  4.  4.  4.  4.  4.  4.  4.]
 [ 5.  5.  5.  5.  5.  5.  5.  5.  5.  5.]
 [ 6.  6.  6.  6.  6.  6.  6.  6.  6.  6.]
 [ 7.  7.  7.  7.  7.  7.  7.  7.  7.  7.]
 [ 8.  8.  8.  8.  8.  8.  8.  8.  8.  8.]
 [ 9.  9.  9.  9.  9.  9.  9.  9.  9.  9.]]

不过，Linux 在使用 fork() 时有一种叫做“写时复制”的特性，所以即使你不使用 multiprocessing.Array，数据也不会被复制，除非你对它进行了写操作。

回答于 2025-04-16 由 Python大师

分享举报

共享只读数据在多进程中会被复制到不同进程吗？

5 个回答

撰写回答