一个Python多进程错误
我这里有一个多进程的示例,但遇到了一些问题。研究了一晚上,我还是没能找到原因。有没有人能帮我一下?
我想要一个父进程充当生产者,当有任务到来时,父进程可以派生一些子进程来处理这些任务。父进程会监控这些子进程,如果有哪个子进程异常退出,父进程可以重新启动它。
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from multiprocessing import Process, Queue from Queue import Empty import sys, signal, os, random, time import traceback
child_process = []
child_process_num = 4
queue = Queue(0)
def work(queue):
signal.signal(signal.SIGINT, signal.SIG_DFL)
signal.signal(signal.SIGTERM, signal.SIG_DFL)
signal.signal(signal.SIGCHLD, signal.SIG_DFL)
time.sleep(10) #demo sleep
def kill_child_processes(signum, frame):
#terminate all children
pass
def restart_child_process(signum, frame):
global child_process
for i in xrange(len(child_process)):
child = child_process[i]
try:
if child.is_alive():
continue
except OSError, e:
pass
child.join() #join this process to make sure there is no zombie process
new_child = Process(target=work, args=(queue,))
new_child.start()
child_process[i] = new_child #restart one new process
child = None
return
if __name__ == '__main__':
reload(sys)
sys.setdefaultencoding("utf-8")
for i in xrange(child_process_num):
child = Process(target=work, args=(queue,))
child.start()
child_process.append(child)
signal.signal(signal.SIGINT, kill_child_processes)
signal.signal(signal.SIGTERM, kill_child_processes) #hook the SIGTERM
signal.signal(signal.SIGCHLD, restart_child_process)
signal.signal(signal.SIGPIPE, signal.SIG_DFL)
当这个程序运行时,会出现如下错误:
Error in atexit._run_exitfuncs: Error in sys.exitfunc: Traceback (most recent call last): File "/usr/local/python/lib/python2.6/atexit.py", line 30, in _run_exitfuncs traceback.print_exc() File "/usr/local/python/lib/python2.6/traceback.py", line 227, in print_exc print_exception(etype, value, tb, limit, file) File "/usr/local/python/lib/python2.6/traceback.py", line 124, in print_exception _print(file, 'Traceback (most recent call last):') File "/usr/local/python/lib/python2.6/traceback.py", line 12, in _print def _print(file, str='', terminator='\n'): File "test.py", line 42, in restart_child_process new_child.start() File "/usr/local/python/lib/python2.6/multiprocessing/process.py", line 99, in start _cleanup() File "/usr/local/python/lib/python2.6/multiprocessing/process.py", line 53, in _cleanup if p._popen.poll() is not None: File "/usr/local/python/lib/python2.6/multiprocessing/forking.py", line 106, in poll pid, sts = os.waitpid(self.pid, flag) OSError: [Errno 10] No child processes
如果我给某个子进程发送信号:kill –SIGINT {child_pid},我会得到:
[root@mail1 mail]# kill -SIGINT 32545 [root@mail1 mail]# Error in atexit._run_exitfuncs: Traceback (most recent call last): File "/usr/local/python/lib/python2.6/atexit.py", line 24, in _run_exitfuncs func(*targs, **kargs) File "/usr/local/python/lib/python2.6/multiprocessing/util.py", line 269, in _exit_function p.join() File "/usr/local/python/lib/python2.6/multiprocessing/process.py", line 119, in join res = self._popen.wait(timeout) File "/usr/local/python/lib/python2.6/multiprocessing/forking.py", line 117, in wait return self.poll(0) File "/usr/local/python/lib/python2.6/multiprocessing/forking.py", line 106, in poll pid, sts = os.waitpid(self.pid, flag) OSError: [Errno 4] Interrupted system call Error in sys.exitfunc: Traceback (most recent call last): File "/usr/local/python/lib/python2.6/atexit.py", line 24, in _run_exitfuncs func(*targs, **kargs) File "/usr/local/python/lib/python2.6/multiprocessing/util.py", line 269, in _exit_function p.join() File "/usr/local/python/lib/python2.6/multiprocessing/process.py", line 119, in join res = self._popen.wait(timeout) File "/usr/local/python/lib/python2.6/multiprocessing/forking.py", line 117, in wait return self.poll(0) File "/usr/local/python/lib/python2.6/multiprocessing/forking.py", line 106, in poll pid, sts = os.waitpid(self.pid, flag) OSError: [Errno 4] Interrupted system call
1 个回答
1
主进程在自己退出之前,会等所有子进程都结束,所以它注册了一个阻塞调用(也就是 wait4),这个调用会让主进程停下来等着子进程完成。你发送的信号打断了这个阻塞调用,所以才会出现堆栈跟踪。
我不太明白的是,如果发送给子进程的信号被转发到了父进程,然后打断了那个 wait4 调用。这和 Unix 的进程组行为有关。