在Python多进程中共享变量

4 投票
2 回答
4129 浏览
提问于 2025-04-18 10:16

我该如何在Python中读取和更新多个工作进程共享的变量呢?

举个例子,我正在用多个进程扫描一系列文件,想要检查一下它们的父目录是否已经被扫描过。

def readFile(filename):
  """ Add the parent folder to the database and process the file
  """

  path_parts = os.path.split(filename)
  dirname = os.path.basename(path_parts[0])
  if dirname not in shared_variable:
    # Insert into the database


   #Other file functions


def main():
  """ Walk through files and pass each file to readFile()
  """
  queue = multiprocessing.Queue()
  pool = multiprocessing.Pool(None, init, [queue])

  for dirpath, dirnames, filenames in os.walk(PATH):

    full_path_fnames = map(lambda fn: os.path.join(dirpath, fn),
                           filenames)
    pool.map(readFile, full_path_fnames)

2 个回答

0

你可以看看这个链接:https://docs.python.org/2/library/multiprocessing.html#sharing-state-between-processes。在这里,你可以通过使用 ValueArray 来实现共享内存,这样就能在两个或多个线程之间共享数据了。

2

你可以使用 multiprocessing.Manager 来帮助解决这个问题。它可以让你创建一个可以在多个进程之间共享的列表:

from functools import partial
import multiprocessing

def readFile(shared_variable, filename):
  """ Add the parent folder to the database and process the file
  """

  path_parts = os.path.split(filename)
  dirname = os.path.basename(path_parts[0])
  if dirname not in shared_variable:
    # Insert into the database


   #Other file functions


def main():
  """ Walk through files and pass each file to readFile()
  """
  manager = multiprocessing.Manager()
  shared_variable = manager.list()
  queue = multiprocessing.Queue()
  pool = multiprocessing.Pool(None, init, [queue])

  func = partial(readFile, shared_variable)
  for dirpath, dirnames, filenames in os.walk(PATH):

    full_path_fnames = map(lambda fn: os.path.join(dirpath, fn),
                           filenames)
    pool.map(func, full_path_fnames)

这里的 partial 只是用来方便地将 shared_variable 传递给每次调用 readFile,同时还会把 full_path_fnames 中的每个成员一起传过去,使用 map 来实现。

撰写回答