批量bsub启动程序

gpu-batch-sub的Python项目详细描述


使用bsub创建批处理gpu作业的纯python脚本

安装/更新

pip install -U git+https://github.com/ferrine/gpu-batch.sub
# or
pip install -U gpu-batch-sub

示例配置

脚本(位置~/.gpubatch.conf)的默认配置应该是 喜欢

> cat config
[gpubatch]
batch=-1
gpu=1
; use ';' for comments
;paths are relative
out=bsub-log/out
err=bsub-log/err
hosts=host1 host2 host3
queue=normal
header=
    #BSUB option1
    #BSUB option2
    #BSUB option3
    custom code you want

示例

# batch by -1
# yields 1 job
> gpu-batch.sub 'python script_1.py' 'python script_2.py' 'python script_2.py --other-args'

# batch by 2
# yields 2 jobs
> gpu-batch.sub -b 2 'python script_1.py' 'python script_2.py' 'python script_2.py --other-args'

# run from file
> gpu-batch.sub -b 2 -f filewithjobs1 filewithjobs2 filewithjobs3
> cat filewithjobs1
multiline \ # comments are ok
    job number one
# comments here are ok too
multiline \
    job number two

# naming jobs
# special syntax is applied (no spaces allowed in jobname)
gpu-batch.sub 'jobname : python script1.py'

# running sequential jobs
# yields 2 jobs with sequentially running commands
> gpu-batch.sub -b 2 -s 'python script_1.py' 'python script_2.py' 'python script_2.py --other-args'

检查命令提交

--debug标志有助于将预期提交的内容打印到LSF

> gpu-batch.sub --debug --batch 2 command1 command2 named:command3
>>>>>>>>>>
#SUBMIT: 0
vvvvvvvvvv
#!/bin/sh
#BSUB -J gpu-batch.sub
#BSUB -o bsub-log/out/gpu-batch.sub-%J-stats.out
#BSUB -q normal
#BSUB -n 1
#BSUB -gpu "num=1:j_exclusive=no:mode=shared"
cd ${LS_SUBCWD}
mkdir -p bsub-log/out
mkdir -p bsub-log/err
{
command1 >\
  bsub-log/out/gpu-batch.sub-${LSB_JOBID}-0.0.0.out 2> bsub-log/err/gpu-batch.sub-${LSB_JOBID}-0.0.0.err ;
} & {
command2 >\
  bsub-log/out/gpu-batch.sub-${LSB_JOBID}-0.1.0.out 2> bsub-log/err/gpu-batch.sub-${LSB_JOBID}-0.1.0.err ;
} & wait

>>>>>>>>>>
#SUBMIT: 1
vvvvvvvvvv
#!/bin/sh
#BSUB -J gpu-batch.sub
#BSUB -o bsub-log/out/gpu-batch.sub-%J-stats.out
#BSUB -q normal
#BSUB -n 1
#BSUB -gpu "num=1:j_exclusive=no:mode=shared"
cd ${LS_SUBCWD}
mkdir -p bsub-log/out
mkdir -p bsub-log/err
{
command3 >\
  bsub-log/out/gpu-batch.sub-${LSB_JOBID}-1.0.0-named.out 2> bsub-log/err/gpu-batch.sub-${LSB_JOBID}-1.0.0-named.err ;
} & wait

从文件运行命令

> cat commands
command1
command2
<sequential> # indicates sequential jobs start
command3
command4
</sequential> # indicates sequential jobs end
> gpu-batch.sub --debug -b 2 -f commands
>>>>>>>>>>
#SUBMIT: 0
vvvvvvvvvv
#!/bin/sh
#BSUB -J gpu-batch.sub
#BSUB -o bsub-log/out/gpu-batch.sub-%J-stats.out
#BSUB -q normal
#BSUB -n 1
#BSUB -gpu "num=1:j_exclusive=no:mode=shared"
cd ${LS_SUBCWD}
mkdir -p bsub-log/out
mkdir -p bsub-log/err
{
command1 >\
  bsub-log/out/gpu-batch.sub-${LSB_JOBID}-0.0.0.out 2> bsub-log/err/gpu-batch.sub-${LSB_JOBID}-0.0.0.err ;
} & {
command2 >\
  bsub-log/out/gpu-batch.sub-${LSB_JOBID}-0.1.0.out 2> bsub-log/err/gpu-batch.sub-${LSB_JOBID}-0.1.0.err ;
} & wait

>>>>>>>>>>
#SUBMIT: 1
vvvvvvvvvv
#!/bin/sh
#BSUB -J gpu-batch.sub
#BSUB -o bsub-log/out/gpu-batch.sub-%J-stats.out
#BSUB -q normal
#BSUB -n 1
#BSUB -gpu "num=1:j_exclusive=no:mode=shared"
cd ${LS_SUBCWD}
mkdir -p bsub-log/out
mkdir -p bsub-log/err
{
command3 >\
  bsub-log/out/gpu-batch.sub-${LSB_JOBID}-1.0.0.out 2> bsub-log/err/gpu-batch.sub-${LSB_JOBID}-1.0.0.err ;
command4 >\
  bsub-log/out/gpu-batch.sub-${LSB_JOBID}-1.0.1.out 2> bsub-log/err/gpu-batch.sub-${LSB_JOBID}-1.0.1.err ;
} & wait

运行单个命令,无引号

> gpu-batch.sub --debug -C python main.py
>>>>>>>>>>
#SUBMIT: 0
vvvvvvvvvv
#!/bin/sh
#BSUB -J gpu-batch.sub
#BSUB -o bsub-log/out/gpu-batch.sub-%J-stats.out
#BSUB -q normal
#BSUB -n 1
#BSUB -gpu "num=1:j_exclusive=no:mode=shared"
cd ${LS_SUBCWD}
mkdir -p bsub-log/out
mkdir -p bsub-log/err
{
python main.py >\
  bsub-log/out/gpu-batch.sub-${LSB_JOBID}-0.0.0.out 2> bsub-log/err/gpu-batch.sub-${LSB_JOBID}-0.0.0.err ;
} & wait

程序说明

usage: gpu-batch.sub [-h] [--batch BATCH] [--sequential] [--gpu GPU]
                     [--out OUT] [--err ERR] [--name NAME] [--hosts HOSTS]
                     [--files FILES [FILES ...]] [-C [C [C ...]]]
                     [--queue QUEUE] [--exclusive] [--debug]
                     [--bsub-bin BSUB_BIN] [--version]
                     [jobs [jobs ...]]

gpu-batch.sub is a util to wrap submissions to LSF in a batch. It
automatically collects jobs, prepares submission file you can check beforehand
with `--debug` flag. `gpu-batch.sub` asks LSF for desired number of GPU per
batch and allocates them in shared or exclusive (not recommended) mode.

positional arguments:
  jobs                  Jobs to execute (e.g. 'python script.py') enclosed as
                        strings, you can specify either files or explicit jobs
                        in command line. Multiline jobs in files are
                        supported. Optional naming schema for jobs has the
                        following syntax 'name:command' (default: [])

optional arguments:
  -h, --help            show this help message and exit
  --batch BATCH, -b BATCH
                        Number of jobs in batch where -1 stands for unlimited
                        batch (default: -1)
  --sequential, -s      Make all jobs sequential within bsub submit (default:
                        False)
  --gpu GPU, -g GPU     Number of gpu per batch (default: 1)
  --out OUT, -o OUT     Output path for stdout (default: bsub-log/out)
  --err ERR, -e ERR     Output path for stderr (default: bsub-log/err)
  --name NAME, -n NAME  Name for job, defaults to base directory of execution
                        (default: $(basename `pwd`))
  --hosts HOSTS         Space or comma separated allowed hosts. Empty string
                        holds for ALL visible hosts. It is suggested to
                        specify hosts in `.conf` file. Passing hosts in
                        command line looks like `--hosts ''` for ALL or
                        `--hosts 'host1,host2'` for 2 hosts (default: )
  --files FILES [FILES ...], -f FILES [FILES ...]
                        Read jobs from files. File can contain multiline jobs
                        for readability (default: [])
  -C [C [C ...]]        Single command, does not require quotes (default: [])
  --queue QUEUE, -q QUEUE
                        Queue name (default: normal)
  --exclusive, -x       Exclusive mode allocates GPU only for 1 separate job.
                        (default: no)
  --debug               Print submissions and exit (default: False)
  --bsub-bin BSUB_BIN   bsub binary path (default: bsub)
  --version             Print version and exit (default: False)

Default settings are stored in `$HOME/.gpubatch.conf`. They will override the
help message as well. Possible settings for config file: batch, gpu, hosts,
header, queue. Header will be appended to LSF submission file as is, there is
no default extra header.

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
解释java选择方法   连接到127.0.0.1的java间歇性故障,连接到IP(eth0)时没有故障   java如何优雅地杀死hadoop作业/intercept`hadoop作业杀死`   java如何通过引导类加载器以编程方式加载另一个类?   url Java:在查询参数之前使用片段构建URI   在BroadLeaf表blc_order_属性中保存OrderAttributes值时发生java错误   安卓将功能从xml转换为java   java如何将数据写入文件?   java JPA SQL结果映射   Java中整数对象比较运算符的引用安全性   Spring测试失败:java。lang.NoClassDefFoundError:org/springframework/cglib/transform/impl/memorysafuendecaredthrowableStrategy   rich:extendedDataTable中的java行选择和数据处理   java为什么我需要在volatile上对多个线程使用synchronized?   java尽管构建成功,但为什么会出现此错误?   数组$ArrayList不能转换为java。util。java中的ArrayList   java如何根据泛型类型调用方法?   java将JLabel添加到JPanel,将JPanel添加到JFrame   如果MapStruct中的源为null,则java将父目标设置为null   JavaJBossDrools从DRL插入事实   java不同的JRE安装(windows)