Python多处理返回di

def toParallel(ht, token): keys = [] words = token[token['hashtag'] == ht]['word'] for w in words: keys.append(checkString(w)) y = {ht:keys} num_cores = multiprocessing.cpu_count() pool = multiprocessing.Pool(num_cores) token = pd.read_csv('/path', sep=",", header = None, encoding='utf-8') token.columns = ['word', 'hashtag', 'count'] hashtag = pd.DataFrame(token.groupby(by='hashtag', as_index=False).count()['hashtag']) result = pd.DataFrame(index = hashtag['hashtag'], columns = range(0, 21)) result = result.fillna(0) final_result = [] final_result = [pool.apply_async(toParallel, args=(ht,token,)) for ht in hashtag['hashtag']]

1条回答

网友

1楼 · 发布于 2024-05-23 21:12:51

final_result = [pool.apply_async(toParallel, args=(ht,token,)) for ht in hashtag['hashtag']]

您可以使用Pool.apply()并立即得到结果（在这种情况下，您不需要multiprocessing呵呵，函数只是为了完整性而存在），或者使用Pool.apply_async()后跟Pool.get()。Pool.apply_async()是异步的。在

像这样：

^{pr2}$

或者，您也可以使用Pool.map()，它将为您完成所有这些。在

不管怎样，我建议你仔细阅读the documentation。在

附录：在回答这个问题时，我假设操作员使用的是一些Unix操作系统，比如Linux或OSX。如果您使用的是Windows，一定不要忘记使用if __name__ == '__main__'来保护父/工作进程。这是因为Windows缺少fork()，因此子进程从文件的开头开始，而不是像在Unix中那样在分叉点开始，所以必须使用if条件来指导它。见here。在

ps：这是不必要的：

num_cores = multiprocessing.cpu_count()
pool = multiprocessing.Pool(num_cores)

如果不带参数调用multiprocessing.Pool()（或None），那么它已经创建了一个与cpu数量相同大小的工作线程池。在

相关问题更多 >

编程相关推荐

热门问题

热门文章