如果我使用多进程,为什么Python代码运行较慢

2024-04-25 06:05:56 发布

您现在位置:Python中文网/ 问答频道 /正文

你好,我现在非常沮丧,因为我的代码没有加速: 我尝试了不同的东西,现在我使用多处理器池。为了评估处理器的加速比和效果,我改变了我使用的处理器的数量。但是,如果我增加处理器的数量,我会失去速度。我不知道为什么,因为理论上,如果我有8个进程,我会并行计算8个图像,如果我使用4个进程,我会并行计算4个图像。当然有一个开销,但这不应该是一个很大的瓶颈。有人发现这个错误了吗?向马克斯问好

'''
Created on 17.11.2017

@author: Max
'''
#!/usr/bin/env python

import os, sys, errno
import re
import argparse
from time import time
import multiprocessing
import glob
import numpy as np
import matplotlib.pyplot as plt
import cv2
def computeFeatures(input, chunk_num):
    thresholded_chunk = []
    #print("Processing Chunk,",chunk_num)
    cv2.threshold(input,127,255,cv2.THRESH_BINARY_INV)
    cv2.threshold(input,127,255,cv2.THRESH_BINARY_INV)
    cv2.threshold(input,127,255,cv2.THRESH_BINARY_INV)
    cv2.threshold(input,127,255,cv2.THRESH_BINARY_INV)
    thresholded_chunk.append(cv2.threshold(input,127,255,cv2.THRESH_BINARY_INV))   
    return (thresholded_chunk, chunk_num)


if __name__ == '__main__':
    num_Proc = 2
    max_Proc = 20
    while num_Proc != max_Proc:

        start = time()
        # Handle command line options
        numProcessors = num_Proc

        # Start my pool
        pool = multiprocessing.Pool(numProcessors)

        # Build task list
        path = "InputSimulation\*" 
        tasks = []
        image_list= []
        img_idx = 0
        image_pathes = glob.glob(path+".jpg")
        results = []
        index_for_chunk = numProcessors
        while img_idx < len(image_pathes):
            #print("InsterImageNumber",img_idx)
            tasks.append( (cv2.imread(image_pathes[img_idx],0), img_idx, ) )
            if img_idx % numProcessors == 0:
                result = [pool.apply_async( computeFeatures, t ) for t in tasks]
                results.append(result)
                tasks = []
            img_idx +=1
        pool.close()
        pool.join()
            # Run tasks    #Flatten list before print

        end = time()
        print("DURATION FOR " +str(num_Proc) +" PROCESSES",end - start)
        num_Proc +=1
        # Process results

Tags: importimginputthresholdtimeproccv2num
1条回答
网友
1楼 · 发布于 2024-04-25 06:05:56

由于您不使用任何带有apply\u async的回调函数,并且始终使用相同的函数(computeFeatures),因此最好使用池.map(). 在您当前的用例中,apply\u async不会并行执行任何计算。除此之外,它还会增加每次计算的开销。你知道吗

示例:

from multiprocessing import Pool
from math import sqrt

p = Pool(8)
num_list = range(100000)

%%time
_ = p.map(sqrt, num_list)

CPU times: user 19 ms, sys: 2.36 ms, total: 21.4 ms

Wall time: 27.7 ms

%%time
_ = [sqrt(num) for num in num_list]

CPU times: user 33.7 ms, sys: 5.93 ms, total: 39.6 ms

Wall time: 37.5 ms

%%time
_ = [p.apply_async(sqrt, num) for num in num_list]

CPU times: user 5.5 s, sys: 1.37 s, total: 6.87 s

Wall time: 5.95 s

如本例所示。简单的计算最好用池.map()将代码更改为使用map,您可能会看到一些改进。同样重要的是为你的系统和手头的问题找到正确的工人数量。你知道吗

result = pool.map(computeFeatures, tasks)

相关问题 更多 >