在GridEngine集群的多个节点上运行作业

3 投票

1 回答

3912 浏览

提问于 2025-04-16 05:04

我有一个128核的集群，想在上面运行一个可以并行处理的任务。这个集群使用的是Sun GridEngine，而我的程序是用Parallel Python、numpy和scipy在Python 2.5.8上写的。在一个单独的节点上（4个核心）运行这个任务，性能比单核提升了大约3.5倍。现在我想更进一步，把这个任务分配到大约4个节点上。我的qsub脚本大致是这样的：

#!/bin/bash
# The name of the job, can be whatever makes sense to you
#$ -N jobname

# The job should be placed into the queue 'all.q'.
#$ -q all.q

# Redirect output stream to this file.
#$ -o jobname_output.dat

# Redirect error stream to this file.

#$ -e jobname_error.dat

# The batchsystem should use the current directory as working directory.
# Both files will be placed in the current
# directory. The batchsystem assumes to find the executable in this directory.
#$ -cwd

# request Bourne shell as shell for job.
#$ -S /bin/sh

# print date and time
date

# spython is the server's version of Python 2.5. Using python instead of spython causes the program to run in python 2.3
spython programname.py

# print date and time again
date

有没有人知道该怎么做吗？

numpy scipy 并行处理任务调度集群计算 gridengine

1 个回答

是的，你需要在你的脚本中加入 Grid Engine 的选项 -np 16，可以这样写：

# Use 16 processors
#$ -np 16

或者在你提交脚本的时候在命令行中加上这个选项。还有一种更长期的做法，就是使用一个 .sge_request 文件。

在我用过的所有 GE 安装中，这个设置会让你使用到 16 个处理器（现在通常叫处理器核心），而且会尽量少用节点。所以如果你的节点每个有 4 个核心，你就会用到 4 个节点；如果每个有 8 个核心，那就只需要 2 个节点，依此类推。如果你想在 8 个节点上使用 2 个核心（这可能是因为你每个进程需要很多内存），那就稍微复杂一点，建议你咨询一下你的支持团队。

回答于 2025-04-16 由 Python大师

分享举报

在GridEngine集群的多个节点上运行作业

1 个回答

撰写回答