何时调用multiprocessing.Pool.join?

2024-04-29 20:12:35 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用“multiprocess.Pool.imap_unordered”,如下所示

from multiprocessing import Pool
pool = Pool()
for mapped_result in pool.imap_unordered(mapping_func, args_iter):
    do some additional processing on mapped_result

我需要在for循环之后调用pool.close还是pool.join


Tags: infromimportforargsresultmultiprocessingmapping
2条回答

当我在计算Levenshtein距离的函数中使用pool.close()pool.join()时,我的内存问题与Memory usage keep growing with Python's multiprocessing.pool相同。该函数运行良好,但在Win7 64计算机上没有正确地进行垃圾回收,每次调用该函数时,内存使用量都会失控,直到它使整个操作系统崩溃。下面是修复漏洞的代码:

stringList = []
for possible_string in stringArray:
    stringList.append((searchString,possible_string))

pool = Pool(5)
results = pool.map(myLevenshteinFunction, stringList)
pool.close()
pool.join()

关闭并加入池后,内存泄漏消失了。

不,你不会的,但如果你不再使用游泳池的话,这可能是个好主意。

Tim Peters在this SO post中很好地解释了调用pool.closepool.join的原因:

As to Pool.close(), you should call that when - and only when - you're never going to submit more work to the Pool instance. So Pool.close() is typically called when the parallelizable part of your main program is finished. Then the worker processes will terminate when all work already assigned has completed.

It's also excellent practice to call Pool.join() to wait for the worker processes to terminate. Among other reasons, there's often no good way to report exceptions in parallelized code (exceptions occur in a context only vaguely related to what your main program is doing), and Pool.join() provides a synchronization point that can report some exceptions that occurred in worker processes that you'd otherwise never see.

相关问题 更多 >