如何获得Python多处理池剩余的“工作”量？

from multiprocessing import Process, Queue class MyClass: def __init__(self, num_processes): self._log = logging.getLogger() self.process_list = [] self.work_queue = Queue() for i in range(num_processes): p_name = 'CPU_%02d' % (i+1) self._log.info('Initializing process %s', p_name) p = Process(target = do_stuff, args = (self.work_queue, 'arg1'), name = p_name)

while True: qsize = self.work_queue.qsize() if qsize == 0: self._log.info('Processing finished') break else: self._log.info('%d simulations still need to be calculated', qsize)

from multiprocessing import Pool class MyClass: def __init__(self, num_processes): self.process_pool = Pool(num_processes) # ... result_list = [] for i in range(1000): result = self.process_pool.apply_async(do_stuff, ('arg1',)) result_list.append(result) # ---> here: how do I monitor the Pool's processing progress? # ...?

3条回答

网友

1楼 · 编辑于 2024-05-13 20:28:10

我想出了下面的异步调用解决方案。

小玩具脚本的例子，但应该广泛应用我认为。

基本上，在无限循环中，在列表生成器中轮询结果对象的ready值，然后求和以计算剩余的已调度池任务数。

一旦没有剩余的break和join（）&close（）。

根据需要添加睡眠循环。

与上述解决方案相同的原理，但没有队列。如果还跟踪最初发送池的任务数，则可以计算完成百分比等。。。

import multiprocessing
import os
import time
from random import randrange


def worker():
    print os.getpid()

    #simulate work
    time.sleep(randrange(5))

if __name__ == '__main__':

    pool = multiprocessing.Pool(processes=8)
    result_objs = []

    print "Begin dispatching work"

    task_count = 10
    for x in range(task_count):
        result_objs.append(pool.apply_async(func=worker))

    print "Done dispatching work"

    while True:
        incomplete_count = sum(1 for x in result_objs if not x.ready())

        if incomplete_count == 0:
            print "All done"
            break

        print str(incomplete_count) + " Tasks Remaining"
        print str(float(task_count - incomplete_count) / task_count * 100) + "% Complete"
        time.sleep(.25)

    pool.close()
    pool.join()

网友

2楼 · 编辑于 2024-05-13 20:28:10

使用Manager队列。这是在工作进程之间共享的队列。如果使用普通队列，则每个工作进程都会对其进行pickle和unpickle操作，并因此进行复制，这样每个工作进程就无法更新队列。

然后让您的工作人员向队列中添加内容，并在工作人员工作时监视队列的状态。您需要使用map_async来执行此操作，因为这样可以看到整个结果何时就绪，从而可以中断监视循环。

示例：

import time
from multiprocessing import Pool, Manager


def play_function(args):
    """Mock function, that takes a single argument consisting
    of (input, queue). Alternately, you could use another function
    as a wrapper.
    """
    i, q = args
    time.sleep(0.1)  # mock work
    q.put(i)
    return i

p = Pool()
m = Manager()
q = m.Queue()

inputs = range(20)
args = [(i, q) for i in inputs]
result = p.map_async(play_function, args)

# monitor loop
while True:
    if result.ready():
        break
    else:
        size = q.qsize()
        print(size)
        time.sleep(0.1)

outputs = result.get()

网友

3楼 · 编辑于 2024-05-13 20:28:10

我也遇到过同样的问题，并为MapResult对象提出了一个简单的解决方案（尽管使用了内部的MapResult数据）

pool = Pool(POOL_SIZE)

result = pool.map_async(get_stuff, todo)
while not result.ready():
    remaining = result._number_left * result._chunksize
    sys.stderr.write('\r\033[2KRemaining: %d' % remaining)
    sys.stderr.flush()
    sleep(.1)

print >> sys.stderr, '\r\033[2KRemaining: 0'

注意，剩余的值并不总是精确的，因为块大小通常是根据要处理的项的数量向上舍入的。

您可以使用pool.map_async(get_stuff, todo, chunksize=1)来循环此操作

相关问题更多 >

编程相关推荐

热门问题

热门文章