线程还是多处理？返回一个非常大的数组

2024-06-09 01:44:53 发布

男 | 程序猿一只，喜欢编程写python代码。

多处理领域的新手。我们将非常感谢您的帮助

import multiprocessing as mp

def func():
    with mp.Pool(initializer = init, initargs = [a,b]) as p:
        temp_arr = p.starmap(process, tuple_list)
        #p.close(); p.join() #Seems Pool will take care of this

    arr = [sum(x) for x in zip(*temp_arr)]

tuple_list有数百万个元组，这就是我在这里尝试多重处理的原因。在我的理解中，这是一个cpu限制的任务（更快的cpu使计算更快），而不是I/O限制，因此我将使用多处理。请纠正我的错误

def init (tempa, tempb):
    global a,b
    a = tempa
    b = tempb

init（）似乎会产生额外的开销。我的实际代码有更多的变量和两个数组要传递给init。能做得更好吗

def process(a1, a2):
    arr = np.zeros(some N)
    #Using a, b, a1, a2 here and 'arr' (very large) is modified
    return arr

问题是“arr”的大小相当大，每个进程都需要有自己的arr。是否有一种使用多处理的有效方法来处理此问题？例如，是否可以在func（）中初始化arr并通过每个进程（）直接访问它？当然，这里需要注意比赛条件。或者有其他选择吗

还有，多线程在这里好吗？同样，我可以在func（）中创建arr，所有线程都可以访问它，但我不知道使用线程可以提高多少速度，而且对实现线程更不熟悉

编辑：

def compute_gaussian_str_func_one_particle(Natoms, particle_types, cell_list, num_classes = 2, rmax=5.0, sigma=0.1):
    neighbs  = cell_list.query (cell_list.points, dict(mode = "ball", r_max = rmax, r_min = 0.01))
    radii = np.arange(0.0, rmax, sigma)
    global output
    output = np.zeros(num_classes*len(radii)*Natoms)

    with mp.Pool(initializer=init_pool, initargs=[num_classes,radii,sigma,particle_types,Natoms]) as p:
        a=p.starmap(process, neighbs) 
        p.close()
        p.join()

    output=[sum(x) for x in zip(*a)]
    return output
    
def process(a1,a2,a3):
    neighb_type = particle_types[a2]
    dist = a3
    c=a1*num_classes*len(radii)
    output[c+neighb_type * len(radii) : c+(neighb_type+1) * len(radii)] += np.exp(- (dist - radii)**2 / (2*sigma**2))
    return output

def init_pool(a,b,c,d,e):
    global num_classes,radii,sigma,particle_types,Natoms
    num_classes = a
    radii = b
    sigma = c
    particle_types = d
    Natoms = e

Tags： output init def a1 process sigma num list

0条回答

目前没有回答

线程还是多处理？返回一个非常大的数组

相关问题更多 >

编程相关推荐

热门问题

热门文章

线程还是多处理？返回一个非常大的数组

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >