Python多进程数量

10 投票
2 回答
13509 浏览
提问于 2025-05-01 01:56

我正在使用Python的多进程池模块来创建一个进程池,并给它分配任务。

我创建了4个进程,并分配了2个任务,但在显示它们的进程编号时,我只看到一个进程编号“6952”...难道不应该打印出2个进程编号吗?

from multiprocessing import Pool
from time import sleep

def f(x):
    import os 
    print "process id = " , os.getpid()
    return x*x

if __name__ == '__main__':
    pool = Pool(processes=4)              # start 4 worker processes

    result  =  pool.map_async(f, (11,))   #Start job 1 
    result1 =  pool.map_async(f, (10,))   #Start job 2
    print "result = ", result.get(timeout=1)  
    print "result1 = ", result1.get(timeout=1)

结果:--

result = process id =  6952
process id =  6952
 [121]
result1 =  [100]
暂无标签

2 个回答

0

它确实打印了两个进程的ID。

result = process id =  6952  <=== process id = 6952
process id =  6952  <=== process id = 6952
 [121]
result1 =  [100]

这是因为你的工作进程很快就完成了,准备好处理另一个请求。

result  =  pool.map_async(f, (11,))   #Start job 1 
result1 =  pool.map_async(f, (10,))   #Start job 2

在上面的代码中,你的工作进程完成了任务,然后回到了池子里,准备好完成第二个工作。这种情况可能有很多原因。最常见的原因是工作进程正在忙,或者还没有准备好。

这里有一个例子,我们将有4个工作进程,但只有一个会立刻准备好。因此我们知道哪个会去完成工作。

# https://gist.github.com/dnozay/b2462798ca89fbbf0bf4

from multiprocessing import Pool,Queue
from time import sleep

def f(x):
    import os 
    print "process id = " , os.getpid()
    return x*x

# Queue that will hold amount of time to sleep
# for each worker in the initialization
sleeptimes = Queue()
for times in [2,3,0,2]:
    sleeptimes.put(times)

# each worker will do the following init.
# before they are handed any task.
# in our case the 3rd worker won't sleep
# and get all the work.
def slowstart(q):
    import os
    num = q.get()
    print "slowstart: process id = {0} (sleep({1}))".format(os.getpid(),num)
    sleep(num)

if __name__ == '__main__':
    pool = Pool(processes=4,initializer=slowstart,initargs=(sleeptimes,))    # start 4 worker processes
    result  =  pool.map_async(f, (11,))   #Start job 1 
    result1 =  pool.map_async(f, (10,))   #Start job 2
    print "result = ", result.get(timeout=3)
    print "result1 = ", result1.get(timeout=3)

例子:

$ python main.py 
slowstart: process id = 97687 (sleep(2))
slowstart: process id = 97688 (sleep(3))
slowstart: process id = 97689 (sleep(0))
slowstart: process id = 97690 (sleep(2))
process id =  97689
process id =  97689
result =  [121]
result1 =  [100]
4

这主要是时间的问题。Windows需要在Pool中启动4个进程,这些进程需要先启动、初始化,然后准备好从Queue中获取任务。在Windows上,这意味着每个子进程都得重新导入__main__模块,并且Pool内部使用的Queue实例也需要在每个子进程中进行反序列化。这一过程需要花费不少时间。实际上,这个时间足够长,以至于当你执行两个map_async()调用时,Pool中的所有进程可能还没有完全启动。你可以通过在Pool中每个工作进程运行的函数中添加一些跟踪代码来观察这一点:

while maxtasks is None or (maxtasks and completed < maxtasks):
    try:
        print("getting {}".format(current_process()))
        task = get()  # This is getting the task from the parent process
        print("got {}".format(current_process()))

输出:

getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
process id =  5145
getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
process id =  5145
getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
result =  [121]
result1 =  [100]
getting <ForkServerProcess(ForkServerPoolWorker-2, started daemon)>
getting <ForkServerProcess(ForkServerPoolWorker-3, started daemon)>
getting <ForkServerProcess(ForkServerPoolWorker-4, started daemon)>
got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>

如你所见,Worker-1在工作进程2到4尝试从Queue中获取任务之前,就已经启动并处理了两个任务。如果你在主进程中创建Pool后,但在调用map_async之前添加一个sleep调用,你会看到不同的进程处理每个请求:

getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
getting <ForkServerProcess(ForkServerPoolWorker-2, started daemon)>
getting <ForkServerProcess(ForkServerPoolWorker-3, started daemon)>
getting <ForkServerProcess(ForkServerPoolWorker-4, started daemon)>
# <sleeping here>
got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
process id =  5183
got <ForkServerProcess(ForkServerPoolWorker-2, started daemon)>
process id =  5184
getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
getting <ForkServerProcess(ForkServerPoolWorker-2, started daemon)>
result =  [121]
result1 =  [100]
got <ForkServerProcess(ForkServerPoolWorker-3, started daemon)>
got <ForkServerProcess(ForkServerPoolWorker-4, started daemon)>
got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
got <ForkServerProcess(ForkServerPoolWorker-2, started daemon)>

(注意你看到的额外的"getting/"got"语句是发送给每个进程的信号,用于优雅地关闭它们。)

在Linux上使用Python 3.x时,我可以通过使用'spawn''forkserver'上下文来重现这种行为,但在使用'fork'时则无法重现。这可能是因为创建子进程的速度比重新导入__main__要快得多。

撰写回答