Python在Windows和Linux上进行CPU密集型工作时的多进程与多线程比较

33 投票
5 回答
11296 浏览
提问于 2025-04-15 13:39

我写了一些测试代码,想看看在处理需要大量计算的任务时,使用多进程模块和线程模块的表现有什么不同。在Linux系统上,我得到了预期的性能提升:

linux (dual quad core xeon):
serialrun took 1192.319 ms
parallelrun took 346.727 ms
threadedrun took 2108.172 ms

我的双核MacBook Pro也显示出了相同的效果:

osx (dual core macbook pro)
serialrun took 2026.995 ms
parallelrun took 1288.723 ms
threadedrun took 5314.822 ms

然后我在一台Windows机器上试了一下,结果却大相径庭。

windows (i7 920):
serialrun took 1043.000 ms
parallelrun took 3237.000 ms
threadedrun took 2343.000 ms

为什么在Windows上,多进程的方式会慢得多呢?

这是我的测试代码:

#!/usr/bin/env python

import multiprocessing
import threading
import time

def print_timing(func):
    def wrapper(*arg):
        t1 = time.time()
        res = func(*arg)
        t2 = time.time()
        print '%s took %0.3f ms' % (func.func_name, (t2-t1)*1000.0)
        return res
    return wrapper


def counter():
    for i in xrange(1000000):
        pass

@print_timing
def serialrun(x):
    for i in xrange(x):
        counter()

@print_timing
def parallelrun(x):
    proclist = []
    for i in xrange(x):
        p = multiprocessing.Process(target=counter)
        proclist.append(p)
        p.start()

    for i in proclist:
        i.join()

@print_timing
def threadedrun(x):
    threadlist = []
    for i in xrange(x):
        t = threading.Thread(target=counter)
        threadlist.append(t)
        t.start()

    for i in threadlist:
        i.join()

def main():
    serialrun(50)
    parallelrun(50)
    threadedrun(50)

if __name__ == '__main__':
    main()

5 个回答

5

有人说在Windows上创建进程比在Linux上要花费更多的资源。如果你在这个网站上搜索一下,会找到一些相关的信息。这是我轻松找到的一个链接

27

Python的多进程文档提到,Windows系统的问题是因为缺少os.fork()这个功能。这可能在这里也适用。

看看当你导入psyco时会发生什么。首先,使用easy_install来安装它:

C:\Users\hughdbrown>\Python26\scripts\easy_install.exe psyco
Searching for psyco
Best match: psyco 1.6
Adding psyco 1.6 to easy-install.pth file

Using c:\python26\lib\site-packages
Processing dependencies for psyco
Finished processing dependencies for psyco

在你的Python脚本顶部加上这行代码:

import psyco
psyco.full()

我在没有这个的情况下得到的结果是:

serialrun took 1191.000 ms
parallelrun took 3738.000 ms
threadedrun took 2728.000 ms

而我在有这个的情况下得到的结果是:

serialrun took 43.000 ms
parallelrun took 3650.000 ms
threadedrun took 265.000 ms

并行处理虽然还是慢,但其他的速度非常快。

补充:另外,试试用多进程池来做这个。(这是我第一次尝试这个,速度太快了,我觉得我可能漏掉了什么。)

@print_timing
def parallelpoolrun(reps):
    pool = multiprocessing.Pool(processes=4)
    result = pool.apply_async(counter, (reps,))

结果:

C:\Users\hughdbrown\Documents\python\StackOverflow>python  1289813.py
serialrun took 57.000 ms
parallelrun took 3716.000 ms
parallelpoolrun took 128.000 ms
threadedrun took 58.000 ms
25

在UNIX系统下,进程比较轻量,启动起来比较快。而在Windows系统中,进程比较重,启动需要花费更多的时间。因此,在Windows上,使用线程来进行多任务处理是更推荐的方式。

撰写回答