一个Python多进程错误

0 投票
1 回答
1882 浏览
提问于 2025-04-16 06:09

我这里有一个多进程的示例,但遇到了一些问题。研究了一晚上,我还是没能找到原因。有没有人能帮我一下?

我想要一个父进程充当生产者,当有任务到来时,父进程可以派生一些子进程来处理这些任务。父进程会监控这些子进程,如果有哪个子进程异常退出,父进程可以重新启动它。


#!/usr/bin/env python
# -*- coding: utf-8 -*-

from multiprocessing import Process, Queue from Queue import Empty import sys, signal, os, random, time import traceback

child_process = []
child_process_num = 4
queue = Queue(0)

def work(queue):
    signal.signal(signal.SIGINT, signal.SIG_DFL)
    signal.signal(signal.SIGTERM, signal.SIG_DFL)
    signal.signal(signal.SIGCHLD, signal.SIG_DFL)

    time.sleep(10) #demo sleep 

def kill_child_processes(signum, frame):
    #terminate all children
    pass

def restart_child_process(signum, frame):
    global child_process

    for i in xrange(len(child_process)):
        child = child_process[i]

        try:
            if child.is_alive():
                continue
        except OSError, e:
            pass

        child.join() #join this process to make sure there is no zombie process

        new_child = Process(target=work, args=(queue,))
        new_child.start()
        child_process[i] = new_child #restart one new process

        child = None
        return

if __name__ == '__main__':
    reload(sys)
    sys.setdefaultencoding("utf-8")

    for i in xrange(child_process_num):
        child = Process(target=work, args=(queue,))
        child.start()
        child_process.append(child)

    signal.signal(signal.SIGINT, kill_child_processes)
    signal.signal(signal.SIGTERM, kill_child_processes) #hook the SIGTERM
    signal.signal(signal.SIGCHLD, restart_child_process)
    signal.signal(signal.SIGPIPE, signal.SIG_DFL)

当这个程序运行时,会出现如下错误:

Error in atexit._run_exitfuncs:
Error in sys.exitfunc:
Traceback (most recent call last):
  File "/usr/local/python/lib/python2.6/atexit.py", line 30, in _run_exitfuncs
    traceback.print_exc()
  File "/usr/local/python/lib/python2.6/traceback.py", line 227, in print_exc
    print_exception(etype, value, tb, limit, file)
  File "/usr/local/python/lib/python2.6/traceback.py", line 124, in print_exception
    _print(file, 'Traceback (most recent call last):')
  File "/usr/local/python/lib/python2.6/traceback.py", line 12, in _print
    def _print(file, str='', terminator='\n'):
  File "test.py", line 42, in restart_child_process
    new_child.start()
  File "/usr/local/python/lib/python2.6/multiprocessing/process.py", line 99, in start
    _cleanup()
  File "/usr/local/python/lib/python2.6/multiprocessing/process.py", line 53, in _cleanup
    if p._popen.poll() is not None:
  File "/usr/local/python/lib/python2.6/multiprocessing/forking.py", line 106, in poll
    pid, sts = os.waitpid(self.pid, flag)
OSError: [Errno 10] No child processes

如果我给某个子进程发送信号:kill –SIGINT {child_pid},我会得到:

[root@mail1 mail]# kill -SIGINT 32545
[root@mail1 mail]# Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/usr/local/python/lib/python2.6/atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "/usr/local/python/lib/python2.6/multiprocessing/util.py", line 269, in _exit_function
    p.join()
  File "/usr/local/python/lib/python2.6/multiprocessing/process.py", line 119, in join
    res = self._popen.wait(timeout)
  File "/usr/local/python/lib/python2.6/multiprocessing/forking.py", line 117, in wait
    return self.poll(0)
  File "/usr/local/python/lib/python2.6/multiprocessing/forking.py", line 106, in poll
    pid, sts = os.waitpid(self.pid, flag)
OSError: [Errno 4] Interrupted system call Error in sys.exitfunc:
Traceback (most recent call last):
  File "/usr/local/python/lib/python2.6/atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "/usr/local/python/lib/python2.6/multiprocessing/util.py", line 269, in _exit_function
    p.join()
  File "/usr/local/python/lib/python2.6/multiprocessing/process.py", line 119, in join
    res = self._popen.wait(timeout)
  File "/usr/local/python/lib/python2.6/multiprocessing/forking.py", line 117, in wait
    return self.poll(0)
  File "/usr/local/python/lib/python2.6/multiprocessing/forking.py", line 106, in poll
    pid, sts = os.waitpid(self.pid, flag)
OSError: [Errno 4] Interrupted system call

1 个回答

1

主进程在自己退出之前,会等所有子进程都结束,所以它注册了一个阻塞调用(也就是 wait4),这个调用会让主进程停下来等着子进程完成。你发送的信号打断了这个阻塞调用,所以才会出现堆栈跟踪。

我不太明白的是,如果发送给子进程的信号被转发到了父进程,然后打断了那个 wait4 调用。这和 Unix 的进程组行为有关。

撰写回答