Python在Windows和Linux上进行CPU密集型工作时的多进程与多线程比较
我写了一些测试代码,想看看在处理需要大量计算的任务时,使用多进程模块和线程模块的表现有什么不同。在Linux系统上,我得到了预期的性能提升:
linux (dual quad core xeon): serialrun took 1192.319 ms parallelrun took 346.727 ms threadedrun took 2108.172 ms
我的双核MacBook Pro也显示出了相同的效果:
osx (dual core macbook pro) serialrun took 2026.995 ms parallelrun took 1288.723 ms threadedrun took 5314.822 ms
然后我在一台Windows机器上试了一下,结果却大相径庭。
windows (i7 920): serialrun took 1043.000 ms parallelrun took 3237.000 ms threadedrun took 2343.000 ms
为什么在Windows上,多进程的方式会慢得多呢?
这是我的测试代码:
#!/usr/bin/env python import multiprocessing import threading import time def print_timing(func): def wrapper(*arg): t1 = time.time() res = func(*arg) t2 = time.time() print '%s took %0.3f ms' % (func.func_name, (t2-t1)*1000.0) return res return wrapper def counter(): for i in xrange(1000000): pass @print_timing def serialrun(x): for i in xrange(x): counter() @print_timing def parallelrun(x): proclist = [] for i in xrange(x): p = multiprocessing.Process(target=counter) proclist.append(p) p.start() for i in proclist: i.join() @print_timing def threadedrun(x): threadlist = [] for i in xrange(x): t = threading.Thread(target=counter) threadlist.append(t) t.start() for i in threadlist: i.join() def main(): serialrun(50) parallelrun(50) threadedrun(50) if __name__ == '__main__': main()
5 个回答
5
有人说在Windows上创建进程比在Linux上要花费更多的资源。如果你在这个网站上搜索一下,会找到一些相关的信息。这是我轻松找到的一个链接。
27
Python的多进程文档提到,Windows系统的问题是因为缺少os.fork()这个功能。这可能在这里也适用。
看看当你导入psyco时会发生什么。首先,使用easy_install来安装它:
C:\Users\hughdbrown>\Python26\scripts\easy_install.exe psyco
Searching for psyco
Best match: psyco 1.6
Adding psyco 1.6 to easy-install.pth file
Using c:\python26\lib\site-packages
Processing dependencies for psyco
Finished processing dependencies for psyco
在你的Python脚本顶部加上这行代码:
import psyco
psyco.full()
我在没有这个的情况下得到的结果是:
serialrun took 1191.000 ms
parallelrun took 3738.000 ms
threadedrun took 2728.000 ms
而我在有这个的情况下得到的结果是:
serialrun took 43.000 ms
parallelrun took 3650.000 ms
threadedrun took 265.000 ms
并行处理虽然还是慢,但其他的速度非常快。
补充:另外,试试用多进程池来做这个。(这是我第一次尝试这个,速度太快了,我觉得我可能漏掉了什么。)
@print_timing
def parallelpoolrun(reps):
pool = multiprocessing.Pool(processes=4)
result = pool.apply_async(counter, (reps,))
结果:
C:\Users\hughdbrown\Documents\python\StackOverflow>python 1289813.py
serialrun took 57.000 ms
parallelrun took 3716.000 ms
parallelpoolrun took 128.000 ms
threadedrun took 58.000 ms
25
在UNIX系统下,进程比较轻量,启动起来比较快。而在Windows系统中,进程比较重,启动需要花费更多的时间。因此,在Windows上,使用线程来进行多任务处理是更推荐的方式。