python中的Threadpool没有预期的快

from multiprocessing.dummy import Pool as ThreadPool from threading import Thread, current_thread from functools import partial data = df['text'] rev = df['stars'] y = [] def product_helper(args): return featureExtraction(*args) def featureExtraction(p,t): temp = [0] * len(bag_of_words) for word in p.split(): if word in bag_of_words: temp[bag_of_words.index(word)] += 1 return temp # function to be mapped over def calculateParallel(threads): pool = ThreadPool(threads) job_args = [(item_a, rev[i]) for i, item_a in enumerate(data)] l = pool.map(product_helper,job_args) pool.close() pool.join() return l temp_X = calculateParallel(12)

2条回答

网友

1楼 · 编辑于 2024-04-23 17:49:29

你的整个代码看起来是CPU Bound而不是IO Bound，你只是在使用GIL下的threads，因此只需运行一个线程加上日常开支。It只在一台上运行核心。到多核运行使用

使用

import multiprocessing
pool = multiprocessing.Pool()
l = pool.map_async(product_helper,job_args)

从多处理.dummy将池作为ThreadPool导入只是thread上的包装模块。It只使用one core而不是更多。在

网友

2楼 · 编辑于 2024-04-23 17:49:29

Python和线程并不能很好地协同工作。有一个已知的问题叫做GIL（全局interperter锁）。基本上，interperter中有一个锁，它使所有线程不能并行运行（即使您有多个cpu核）。Python只需一个接一个地给每个线程几毫秒的cpu时间（它变慢的原因是这些线程之间上下文切换的开销）。在

这里有一个非常好的文档解释了它的工作原理：http://www.dabeaz.com/python/UnderstandingGIL.pdf

为了解决您的问题，我建议您尝试多重处理： https://pymotw.com/2/multiprocessing/basics.html

注意：多处理不是100%等同于多线程。多处理将并行运行，但不同的进程不会共享内存，因此如果您更改其中一个进程中的变量，则另一个进程中也不会更改该变量。在

相关问题更多 >

编程相关推荐

热门问题

热门文章