在Windows上演示Python多核加速的一些示例代码是什么?

2024-06-16 08:40:23 发布

您现在位置:Python中文网/ 问答频道 /正文

我在Windows上使用python3并试图构建一个玩具示例,演示如何使用多个CPU内核来加速计算。玩具的例子是曼德尔布罗特分形的渲染。在

目前为止:

  • 我避免了线程,因为全局解释器锁在这种情况下禁止多核
  • 我放弃了在Windows上不能工作的示例代码,因为它缺乏Linux的分叉功能
  • 尝试使用“多处理”包。我声明p=Pool(8)(8是我的核心数),并使用p.starmap(..)来委派工作。这将产生多个“子进程”,windows会自动将这些“子进程”委托给不同的cpu

但是,我无法演示任何加速,无论是由于开销还是没有实际的多处理。因此,指向具有可演示加速的玩具示例的指针将非常有帮助:-)

编辑:谢谢!这把我推向了正确的方向,现在我有了一个实际的例子,它演示了一个4核CPU的速度翻倍。
我的代码和“课堂讲稿”的副本:https://pastebin.com/c9HZ2vAV

我决定使用Pool(),但稍后将尝试@16num指出的“进程”替代方法。下面是Pool()的代码示例:

    p = Pool(cpu_count())

    #Unlike map, starmap only allows 1 input. "partial" provides a workaround
    partial_calculatePixel = partial(calculatePixel, dataarray=data) 
    koord = []
    for j in range(height):
        for k in range(width):
            koord.append((j,k))

    #Runs the calls to calculatePixel in a pool. "hmm" collects the output
    hmm = p.starmap(partial_calculatePixel,koord)

Tags: 代码in示例for进程windowscpupartial
1条回答
网友
1楼 · 发布于 2024-06-16 08:40:23

演示多处理速度非常简单:

import multiprocessing
import sys
import time

# multi-platform precision clock
get_timer = time.clock if sys.platform == "win32" else time.time

def cube_function(num):
    time.sleep(0.01)  # let's simulate it takes ~10ms for the CPU core to cube the number
    return num**3

if __name__ == "__main__":  # multiprocessing guard
    # we'll test multiprocessing with pools from one to the number of CPU cores on the system
    # it won't show significant improvements after that and it will soon start going
    # downhill due to the underlying OS thread context switches
    for workers in range(1, multiprocessing.cpu_count() + 1):
        pool = multiprocessing.Pool(processes=workers)
        # lets 'warm up' our pool so it doesn't affect our measurements
        pool.map(cube_function, range(multiprocessing.cpu_count()))
        # now to the business, we'll have 10000 numbers to quart via our expensive function
        print("Cubing 10000 numbers over {} processes:".format(workers))
        timer = get_timer()  # time measuring starts now
        results = pool.map(cube_function, range(10000))  # map our range to the cube_function
        timer = get_timer() - timer  # get our delta time as soon as it finishes
        print("\tTotal: {:.2f} seconds".format(timer))
        print("\tAvg. per process: {:.2f} seconds".format(timer / workers))
        pool.close()  # lets clear out our pool for the next run
        time.sleep(1)  # waiting for a second to make sure everything is cleaned up

{1}在这里,我们可以用一个实际的计算来代替。结果如预期:

^{pr2}$

为什么不100%线性化呢?首先,将数据映射/分发到子进程并将其取回需要一些时间,上下文切换有一些开销,还有一些任务不时使用我的cpu,time.sleep()并不精确(也不可能在非RT操作系统上)。。。但结果与并行处理的预期大致相符。在

相关问题 更多 >