分别在并行进程中改变不同的python对象

1条回答

网友

1楼 · 发布于 2024-04-26 10:10:59

这似乎是一个相当复杂的类，我不能完全预料到这个解决方案在您的情况下是否有效。对于这样一个复杂的类，一个简单的折衷方法是使用^{}。在

如果这不能回答你的问题，那么最好用一个最小的、有效的例子。在

from concurrent.futures import ProcessPoolExecutor

import numpy as np

class ArrayDict ():
  keys = None
  vals = None

  def __init__ (self):
    self.keys = dict ()
    self.vals = np.random.rand (1000)

  def __str__ (self):
    return "keys: " + str(self.keys) + ", vals: " + str(self.vals.mean())

def myMethod (ad, args):
  print ("starting:", ad)


if __name__ == '__main__':
  l     = [ArrayDict() for _ in range (5)]
  args  = [2, 3, 4, 1, 3]

  with ProcessPoolExecutor (max_workers = 2) as ex:

    d = ex.map (myMethod, l, args)

对象是cloned当发送到子进程时，您需要返回结果（因为对对象的更改不会传播回主进程），并处理如何存储它们。在

Note that changes to class variables will propagate to other objects in the same process, e.g. if you have more tasks than processes, changes to class variables will be shared among the instances running in the same process. This is usually undesired behavior.

这是并行化的高级接口。ProcessPoolExecutor使用multiprocessing模块，并且只能与pickable objects一起使用。我怀疑ProcessPoolExecutor的性能与"sharing state between processes"相似。在引擎盖下，ProcessPoolExecutoris using ^{}，并且应该表现出与Pool相似的性能（除了在map中使用very long iterables）。ProcessPoolExecutor似乎是python中用于并发任务的未来API。在

如果可以的话，通常使用^{}（它可以与ProcessPoolExecutor交换）更快。在这种情况下，对象在进程之间共享，的更新将传播回主线程。在

如前所述，最快的选择可能是重新构造ArrayDict，以便它只使用可以由^{}或{}表示的对象。在

如果ProcessPoolExecutor不起作用，并且您无法优化ArrayDict，那么您可能无法使用^{}。关于如何做到这一点，有很多很好的例子here。在

The greatest performance gain is often likely to be found in myMethod. And, as I mentioned, the overhead of using threads is less than that of processes.

相关问题更多 >

编程相关推荐

热门问题

热门文章