我是dask的新手,我发现有一个模块可以很容易地实现并行化。我正在做一个项目,在这个项目中,我能够在一台机器上并行化一个循环 you can see here 。但是,我想转到dask.distributed
。我对上面的类应用了以下更改:
diff --git a/mlchem/fingerprints/gaussian.py b/mlchem/fingerprints/gaussian.py
index ce6a72b..89f8638 100644
--- a/mlchem/fingerprints/gaussian.py
+++ b/mlchem/fingerprints/gaussian.py
@@ -6,7 +6,7 @@ from sklearn.externals import joblib
from .cutoff import Cosine
from collections import OrderedDict
import dask
-import dask.multiprocessing
+from dask.distributed import Client
import time
@@ -141,13 +141,14 @@ class Gaussian(object):
for image in images.items():
computations.append(self.fingerprints_per_image(image))
+ client = Client()
if self.scaler is None:
- feature_space = dask.compute(*computations, scheduler='processes',
+ feature_space = dask.compute(*computations, scheduler='distributed',
num_workers=self.cores)
feature_space = OrderedDict(feature_space)
else:
stacked_features = dask.compute(*computations,
- scheduler='processes',
+ scheduler='distributed',
num_workers=self.cores)
stacked_features = numpy.array(stacked_features)
这样做会产生以下错误:
^{pr2}$我尝试过添加if __name__ == '__main__':
的不同方法,但没有成功。这可以是reproduced by running this example。如果有人能帮我解决这个问题,我将不胜感激。我不知道该如何更改代码以使其正常工作。在
谢谢。在
编辑:示例是cu_training.py
。在
Client
命令启动新进程,因此它必须位于if __name__ == '__main__':
块中,如本SO question或此GitHub issue所述这与多处理模块相同
相关问题 更多 >
编程相关推荐