Python的多进程池中是否存在一个bug,将失效的进程保留在spawn模式下?

2024-03-29 02:06:43 发布

您现在位置:Python中文网/ 问答频道 /正文

我们注意到,在我们的一个部署中,有一堆已经失效的(僵尸)进程被遗留下来,并设法生成了一个非常小的程序来显示问题:

multi.py

from multiprocessing import Pool, set_start_method

def f(x):
    return x*x

if __name__ == '__main__':
    set_start_method('spawn')
    with Pool(5) as p:
        print(p.map(f, [1, 2, 3]))
        p.close()
        p.join()

这个程序似乎离开了僵尸进程,但很难捕获,因为从常规shell运行这个程序将导致shell捕获僵尸

在我们的部署中,这是从另一个python程序运行的,因此为了模拟它,我们有:

main.py

from subprocess import run
from time import sleep

while True:
    result = run(["python", "multi.py"], capture_output=True)
    print(result.stdout.decode('utf-8'))
    result = run(["ps", "-ef", "--forest"], capture_output=True)
    print(result.stdout.decode('utf-8'), flush=True)
    sleep(1)

运行main.py产生以下输出:

[1, 4, 9]

UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0 11 11:33 pts/0    00:00:00 python main.py
root         8     1  0 11:33 pts/0    00:00:00 [python] <defunct>
root        17     1  0 11:33 pts/0    00:00:00 ps -ef --forest

[1, 4, 9]

UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  6 11:33 pts/0    00:00:00 python main.py
root         8     1  3 11:33 pts/0    00:00:00 [python] <defunct>
root        19     1  0 11:33 pts/0    00:00:00 [python] <defunct>
root        28     1  0 11:33 pts/0    00:00:00 ps -ef --forest

[1, 4, 9]

UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  4 11:33 pts/0    00:00:00 python main.py
root         8     1  1 11:33 pts/0    00:00:00 [python] <defunct>
root        19     1  3 11:33 pts/0    00:00:00 [python] <defunct>
root        30     1  0 11:33 pts/0    00:00:00 [python] <defunct>
root        39     1  0 11:33 pts/0    00:00:00 ps -ef --forest

[1, 4, 9]

UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  3 11:33 pts/0    00:00:00 python main.py
root         8     1  1 11:33 pts/0    00:00:00 [python] <defunct>
root        19     1  1 11:33 pts/0    00:00:00 [python] <defunct>
root        30     1  4 11:33 pts/0    00:00:00 [python] <defunct>
root        41     1  0 11:33 pts/0    00:00:00 [python] <defunct>
root        50     1  0 11:33 pts/0    00:00:00 ps -ef --forest

另一方面,以下程序不会产生失效的进程:

主信号py

from os import wait
import signal
from subprocess import run
from time import sleep

def chld_handler(_signum, _frame):
    wait()

signal.signal(signal.SIGCHLD, chld_handler)

while True:
    result = run(["python", "multi.py"], capture_output=True)
    print(result.stdout.decode('utf-8'))
    result = run(["ps", "-ef", "--forest"], capture_output=True)
    print(result.stdout.decode('utf-8'), flush=True)
    sleep(1)

另外,下面的简单shell脚本deoes不产生僵尸:

#!/usr/bin/env bash

while :; do
    python multi.py
    ps -ef --forest
    sleep 1
done

这是Python中的一个bug,还是您需要处理来自子进程的任何僵尸(就像Bash看起来所做的那样)

所有代码和Dockerfile都可以在此处轻松复制该问题: https://github.com/viktorvia/python-multi-issue

该问题可在Python 3.9.6、3.7.4和3.7.11中重现


Tags: runfrompyimporttruemainrootresult