Sun Grid引擎上的ipcluster只有0级

2016-07-15 14:47:09.749 [IPClusterStart] Starting ipcluster with [daemon=False] 2016-07-15 14:47:09.751 [IPClusterStart] Creating pid file: /home/USERNAME/.ipython/profile_sge/pid/ipcluster.pid 2016-07-15 14:47:09.751 [IPClusterStart] Starting Controller with SGEControllerLauncher 2016-07-15 14:47:09.789 [IPClusterStart] Job submitted with job id: u'6354583' 2016-07-15 14:47:10.790 [IPClusterStart] Starting 100 Engines with SGEEngineSetLauncher 2016-07-15 14:47:10.826 [IPClusterStart] Job submitted with job id: u'6354584' 2016-07-15 14:47:40.856 [IPClusterStart] Engines appear to have started successfully

[stdout:0] I am #0 of 1 and run on compute-8-13.local [stdout:1] I am #0 of 1 and run on compute-8-13.local [stdout:2] I am #0 of 1 and run on compute-3-3.local [stdout:3] I am #0 of 1 and run on compute-3-3.local [stdout:4] I am #0 of 1 and run on compute-3-3.local ...

c.IPClusterEngines.engine_launcher_class = 'SGEEngineSetLauncher' c.IPClusterStart.controller_launcher_class = 'SGEControllerLauncher' c.SlurmEngineSetLauncher.batch_template_file = '/home/USERNAME/.ipython/profile_sge/sge.engine.template' c.SlurmControllerLauncher.batch_template_file = '/home/USERNAME/.ipython/profile_sge/sge.controller.template'

# /bin/sh #$ -S /bin/sh #$ -pe orte 1 #$ -q sThC.q #$ -cwd #$ -N ipyparallel_controller #$ -o ipyparallel_controller.log #$ -e ipyparallel_controller.err module load gcc/5.3/openmpi source activate parallel ipcontroller --profile-dir={profile_dir}

# /bin/sh #$ -S /bin/sh #$ -pe orte {n} #$ -q sThC.q #$ -cwd #$ -N ipyparallel_engines #$ -o ipyparallel_engines.log #$ -e ipyparallel_engines.err module load gcc/5.3/openmpi source activate parallel mpiexec -n {n} ipengine --profile-dir={profile_dir} --timeout=30

1条回答

网友

1楼 · 发布于 2024-05-21 00:08:47

我自己找到了解决方案/错误：

在ipcluster_config.py中，我忘了重命名Slurm->；SGE的一些情况，所以应该是这样

c.IPClusterEngines.engine_launcher_class = 'SGEEngineSetLauncher'
c.IPClusterStart.controller_launcher_class = 'SGEControllerLauncher'
c.SGEEngineSetLauncher.batch_template_file = '/home/USERNAME/.ipython/profile_sge/sge.engine.template'
c.SGEControllerLauncher.batch_template_file = '/home/USERNAME/.ipython/profile_sge/sge.controller.template'

这导致ipcluster使用某种默认的SGE模板，该模板提交了100个单独的作业，而不是一个包含100个进程的作业。你知道吗

现在我如愿以偿：

[stdout:0] I am #5 of 100 and run on compute-5-17.local
[stdout:1] I am #9 of 100 and run on compute-5-17.local
[stdout:2] I am #1 of 100 and run on compute-5-17.local
[stdout:3] I am #7 of 100 and run on compute-5-17.local
[stdout:4] I am #2 of 100 and run on compute-5-17.local
...

相关问题更多 >

编程相关推荐

热门问题

热门文章