在python的concurrent.futures中查找BrokenProcessPool的原因

2024-05-21 07:37:03 发布

您现在位置:Python中文网/ 问答频道 /正文

简而言之

当我的代码与concurrent.futures并行时,出现BrokenProcessPool异常。不再显示错误。我想找出错误的原因,并询问如何做到这一点。

全部问题

我使用concurrent.futures来并行化一些代码。

with ProcessPoolExecutor() as pool:
    mapObj = pool.map(myMethod, args)

我最终(仅)得到了以下例外:

concurrent.futures.process.BrokenProcessPool: A child process terminated abruptly, the process pool is not usable anymore

不幸的是,程序很复杂,只有在程序运行30分钟后才会出现错误。因此,我不能提供一个很好的最小的例子。

为了找到问题的原因,我包装了与try except块并行运行的方法:

def myMethod(*args):
    try:
        ...
    except Exception as e:
        print(e)

问题仍然是一样的,除了街区从未进入。我的结论是异常不是来自我的代码。

我的下一步是编写一个定制的ProcessPoolExecutor类,它是原始ProcessPoolExecutor的子类,并允许我用定制的方法替换一些方法。我复制并粘贴了方法_process_worker的原始代码,并添加了一些打印语句。

def _process_worker(call_queue, result_queue):
    """Evaluates calls from call_queue and places the results in result_queue.
        ...
    """
    while True:
        call_item = call_queue.get(block=True)
        if call_item is None:
            # Wake up queue management thread
            result_queue.put(os.getpid())
            return
        try:
            r = call_item.fn(*call_item.args, **call_item.kwargs)
        except BaseException as e:
                print("??? Exception ???")                 # newly added
                print(e)                                   # newly added
            exc = _ExceptionWithTraceback(e, e.__traceback__)
            result_queue.put(_ResultItem(call_item.work_id, exception=exc))
        else:
            result_queue.put(_ResultItem(call_item.work_id,
                                         result=r))

同样,从未输入except块。这是意料之中的,因为我已经确保我的代码不会引发异常(如果一切正常,异常应该传递给主进程)。

现在我不知道怎样才能找到错误。此处引发异常:

def submit(self, fn, *args, **kwargs):
    with self._shutdown_lock:
        if self._broken:
            raise BrokenProcessPool('A child process terminated '
                'abruptly, the process pool is not usable anymore')
        if self._shutdown_thread:
            raise RuntimeError('cannot schedule new futures after shutdown')

        f = _base.Future()
        w = _WorkItem(f, fn, args, kwargs)

        self._pending_work_items[self._queue_count] = w
        self._work_ids.put(self._queue_count)
        self._queue_count += 1
        # Wake up queue management thread
        self._result_queue.put(None)

        self._start_queue_management_thread()
        return f

此处设置为断开进程池:

def _queue_management_worker(executor_reference,
                             processes,
                             pending_work_items,
                             work_ids_queue,
                             call_queue,
                             result_queue):
    """Manages the communication between this process and the worker processes.
        ...
    """
    executor = None

    def shutting_down():
        return _shutdown or executor is None or executor._shutdown_thread

    def shutdown_worker():
        ...

    reader = result_queue._reader

    while True:
        _add_call_item_to_queue(pending_work_items,
                                work_ids_queue,
                                call_queue)

        sentinels = [p.sentinel for p in processes.values()]
        assert sentinels
        ready = wait([reader] + sentinels)
        if reader in ready:
            result_item = reader.recv()
        else:                               #THIS BLOCK IS ENTERED WHEN THE ERROR OCCURS
            # Mark the process pool broken so that submits fail right now.
            executor = executor_reference()
            if executor is not None:
                executor._broken = True
                executor._shutdown_thread = True
                executor = None
            # All futures in flight must be marked failed
            for work_id, work_item in pending_work_items.items():
                work_item.future.set_exception(
                    BrokenProcessPool(
                        "A process in the process pool was "
                        "terminated abruptly while the future was "
                        "running or pending."
                    ))
                # Delete references to object. See issue16284
                del work_item
            pending_work_items.clear()
            # Terminate remaining workers forcibly: the queues or their
            # locks may be in a dirty state and block forever.
            for p in processes.values():
                p.terminate()
            shutdown_worker()
            return
        ...

进程终止是(或似乎是)一个事实,但我不知道为什么。到目前为止我的想法正确吗?哪些可能的原因导致进程在没有消息的情况下终止?(这可能吗?)在哪里可以应用进一步的诊断?为了更接近解决方案,我应该问自己哪些问题?

我在64位Linux上使用Python3.5。


Tags: theinselfnonequeuedefresultcall