Python多进程数量
我正在使用Python的多进程池模块来创建一个进程池,并给它分配任务。
我创建了4个进程,并分配了2个任务,但在显示它们的进程编号时,我只看到一个进程编号“6952”...难道不应该打印出2个进程编号吗?
from multiprocessing import Pool
from time import sleep
def f(x):
import os
print "process id = " , os.getpid()
return x*x
if __name__ == '__main__':
pool = Pool(processes=4) # start 4 worker processes
result = pool.map_async(f, (11,)) #Start job 1
result1 = pool.map_async(f, (10,)) #Start job 2
print "result = ", result.get(timeout=1)
print "result1 = ", result1.get(timeout=1)
结果:--
result = process id = 6952
process id = 6952
[121]
result1 = [100]
2 个回答
它确实打印了两个进程的ID。
result = process id = 6952 <=== process id = 6952
process id = 6952 <=== process id = 6952
[121]
result1 = [100]
这是因为你的工作进程很快就完成了,准备好处理另一个请求。
result = pool.map_async(f, (11,)) #Start job 1
result1 = pool.map_async(f, (10,)) #Start job 2
在上面的代码中,你的工作进程完成了任务,然后回到了池子里,准备好完成第二个工作。这种情况可能有很多原因。最常见的原因是工作进程正在忙,或者还没有准备好。
这里有一个例子,我们将有4个工作进程,但只有一个会立刻准备好。因此我们知道哪个会去完成工作。
# https://gist.github.com/dnozay/b2462798ca89fbbf0bf4
from multiprocessing import Pool,Queue
from time import sleep
def f(x):
import os
print "process id = " , os.getpid()
return x*x
# Queue that will hold amount of time to sleep
# for each worker in the initialization
sleeptimes = Queue()
for times in [2,3,0,2]:
sleeptimes.put(times)
# each worker will do the following init.
# before they are handed any task.
# in our case the 3rd worker won't sleep
# and get all the work.
def slowstart(q):
import os
num = q.get()
print "slowstart: process id = {0} (sleep({1}))".format(os.getpid(),num)
sleep(num)
if __name__ == '__main__':
pool = Pool(processes=4,initializer=slowstart,initargs=(sleeptimes,)) # start 4 worker processes
result = pool.map_async(f, (11,)) #Start job 1
result1 = pool.map_async(f, (10,)) #Start job 2
print "result = ", result.get(timeout=3)
print "result1 = ", result1.get(timeout=3)
例子:
$ python main.py
slowstart: process id = 97687 (sleep(2))
slowstart: process id = 97688 (sleep(3))
slowstart: process id = 97689 (sleep(0))
slowstart: process id = 97690 (sleep(2))
process id = 97689
process id = 97689
result = [121]
result1 = [100]
这主要是时间的问题。Windows需要在Pool
中启动4个进程,这些进程需要先启动、初始化,然后准备好从Queue
中获取任务。在Windows上,这意味着每个子进程都得重新导入__main__
模块,并且Pool
内部使用的Queue
实例也需要在每个子进程中进行反序列化。这一过程需要花费不少时间。实际上,这个时间足够长,以至于当你执行两个map_async()
调用时,Pool
中的所有进程可能还没有完全启动。你可以通过在Pool
中每个工作进程运行的函数中添加一些跟踪代码来观察这一点:
while maxtasks is None or (maxtasks and completed < maxtasks):
try:
print("getting {}".format(current_process()))
task = get() # This is getting the task from the parent process
print("got {}".format(current_process()))
输出:
getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
process id = 5145
getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
process id = 5145
getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
result = [121]
result1 = [100]
getting <ForkServerProcess(ForkServerPoolWorker-2, started daemon)>
getting <ForkServerProcess(ForkServerPoolWorker-3, started daemon)>
getting <ForkServerProcess(ForkServerPoolWorker-4, started daemon)>
got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
如你所见,Worker-1
在工作进程2到4尝试从Queue
中获取任务之前,就已经启动并处理了两个任务。如果你在主进程中创建Pool
后,但在调用map_async
之前添加一个sleep
调用,你会看到不同的进程处理每个请求:
getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
getting <ForkServerProcess(ForkServerPoolWorker-2, started daemon)>
getting <ForkServerProcess(ForkServerPoolWorker-3, started daemon)>
getting <ForkServerProcess(ForkServerPoolWorker-4, started daemon)>
# <sleeping here>
got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
process id = 5183
got <ForkServerProcess(ForkServerPoolWorker-2, started daemon)>
process id = 5184
getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
getting <ForkServerProcess(ForkServerPoolWorker-2, started daemon)>
result = [121]
result1 = [100]
got <ForkServerProcess(ForkServerPoolWorker-3, started daemon)>
got <ForkServerProcess(ForkServerPoolWorker-4, started daemon)>
got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
got <ForkServerProcess(ForkServerPoolWorker-2, started daemon)>
(注意你看到的额外的"getting
/"got"
语句是发送给每个进程的信号,用于优雅地关闭它们。)
在Linux上使用Python 3.x时,我可以通过使用'spawn'
和'forkserver'
上下文来重现这种行为,但在使用'fork'
时则无法重现。这可能是因为创建子进程的速度比重新导入__main__
要快得多。