在Python多进程中共享变量
我该如何在Python中读取和更新多个工作进程共享的变量呢?
举个例子,我正在用多个进程扫描一系列文件,想要检查一下它们的父目录是否已经被扫描过。
def readFile(filename):
""" Add the parent folder to the database and process the file
"""
path_parts = os.path.split(filename)
dirname = os.path.basename(path_parts[0])
if dirname not in shared_variable:
# Insert into the database
#Other file functions
def main():
""" Walk through files and pass each file to readFile()
"""
queue = multiprocessing.Queue()
pool = multiprocessing.Pool(None, init, [queue])
for dirpath, dirnames, filenames in os.walk(PATH):
full_path_fnames = map(lambda fn: os.path.join(dirpath, fn),
filenames)
pool.map(readFile, full_path_fnames)
2 个回答
0
你可以看看这个链接:https://docs.python.org/2/library/multiprocessing.html#sharing-state-between-processes。在这里,你可以通过使用 Value
或 Array
来实现共享内存,这样就能在两个或多个线程之间共享数据了。
2
你可以使用 multiprocessing.Manager
来帮助解决这个问题。它可以让你创建一个可以在多个进程之间共享的列表:
from functools import partial
import multiprocessing
def readFile(shared_variable, filename):
""" Add the parent folder to the database and process the file
"""
path_parts = os.path.split(filename)
dirname = os.path.basename(path_parts[0])
if dirname not in shared_variable:
# Insert into the database
#Other file functions
def main():
""" Walk through files and pass each file to readFile()
"""
manager = multiprocessing.Manager()
shared_variable = manager.list()
queue = multiprocessing.Queue()
pool = multiprocessing.Pool(None, init, [queue])
func = partial(readFile, shared_variable)
for dirpath, dirnames, filenames in os.walk(PATH):
full_path_fnames = map(lambda fn: os.path.join(dirpath, fn),
filenames)
pool.map(func, full_path_fnames)
这里的 partial
只是用来方便地将 shared_variable
传递给每次调用 readFile
,同时还会把 full_path_fnames
中的每个成员一起传过去,使用 map
来实现。