有效整合同期期货并行执行？

2024-04-29 09:01:46 发布

男 | 程序猿一只，喜欢编程写python代码。

我有一个大约100米行的pandas数据帧。并行处理在多核机器上运行得非常好，每个内核的利用率都是100%。但是，executor.map()的结果是一个生成器，因此为了实际收集处理后的结果，我遍历该生成器。这是非常，非常慢（小时），部分是因为它是单核，部分是因为循环。实际上，它比my_function()中的实际处理慢得多

是否有更好的方法（可能是并发和/或矢量化）？在

编辑：在python3.7.0中使用pandas 0.23.4（目前最新版本）

import concurrent
import pandas as pd

df = pd.DataFrame({'col1': [], 'col2': [], 'col3': []})

with concurrent.futures.ProcessPoolExecutor() as executor:
    gen = executor.map(my_function, list_of_values, chunksize=1000)

# the following is single-threaded and also very slow
for x in gen:
    df = pd.concat([df, x])  # anything better than doing this?
return df

Tags：数据 import 机器 map pandas df my as

1条回答

网友

1楼 · 发布于 2024-04-29 09:01:46

以下是与您的案例相关的基准测试：https://stackoverflow.com/a/31713471/5588279

如您所见，concat（append）多次是非常低效的。你应该做pd.concat(gen)。我相信underlyig实现将预先分配所有需要的内存。在

在您的例子中，内存分配每次都会完成。在

有效整合同期期货并行执行？

相关问题更多 >

编程相关推荐

热门问题

热门文章

有效整合同期期货并行执行？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >