从亚基因组学数据构建基因组目录的管道
metapi的Python项目详细描述
metapi
你好,超基因组学!
兄弟项目
动机
我们都需要一个用于学术研究的基因组学管道。
原理
- 将智能绑定在一起
- github
- 为什么我们在这里?
- 不要制造轮子
- 充分利用流水线执行引擎
- 充分利用优秀的生物信息学工具
- 健壮、模块化、可扩展、更新
- 一条规则,一个模块
- 一个模块,一个分析
- 欢迎来到PR
设计
执行模块
# Snakefileinclude:"rules/step.smk"include:"rules/simulation.smk"include:"rules/fastqc.smk"include:"rules/trimming.smk"include:"rules/rmhost.smk"include:"rules/assembly.smk"include:"rules/alignment.smk"include:"rules/binning.smk"include:"rules/cobinning.smk"include:"rules/checkm.smk"include:"rules/dereplication.smk"include:"rules/classification.smk"include:"rules/annotation.smk"include:"rules/profilling.smk"
分析模块
- 原始数据报告
- 质量控制
- 删除主机序列
- 装配
- 装配评估
- 装箱
- 检查
- 取消应用
- 箱子外形
- 分类法分类
- 基因组注释
- 函数注释
测试模块
- 执行测试
- 分析测试
安装
安装依赖项*
- snakemake
- pigz
- ncbi-genome-download
- InSilicoSeq
- OAFilter
- sickle
- fastp
- MultiQC
- bwa
- samtools
- spades
- idba
- megahit
- quast
- MetaBat
- MaxBin2
- CheckM
- drep
- prokka
- metaphlan2
# in python3 environment conda install snakemake pigz ncbi-genome-download sickle-trim fastp bwa samtools \ bbmap spades idba megahit maxbin2 prokka metabat2 drep quast checkm-genome pip install insilicoseq # in python2 envrionment conda install metaphlan2 # database configuration wget https://data.ace.uq.edu.au/public/CheckM_databases/checkm_data_2015_01_16.tar.gz mkdir checkm_data cd checkm_data tar -xzvf ../checkm_data_2015_01_16.tar.gz cd .. ln -s checkm_data checkm_data_latest # activate python3 environment where checkm in checkm data setRoot checkm_data_latest
安装metapi
# recommand git clone https://github.com/ohmeta/metapi # or (maybe not latest) pip install metapi
示例
毒蛇一号:)
rulebwa_mem:input:r1="fastq/sample_1.fq.gz",r2="fastq/sample_2.fq.gz",ref="ref/ref.indexoutput:bam="sample.sort.bam",stat="sample_flagstat.txt"threads:8shell:''' bwa mem -t {threads} {input.ref} {input.r1} {input.r2} | \ samtools view -@{threads} -hbS - | \ tee >(samtools flagstat -@{threads} - > {output.stat}) | \ samtools sort -@{threads} -o {output.bam} - '''
模拟亚基因组数据测试(未完成)
# in metapi/example/basic_test directorycd example/basic_test # look snakemake --dag | dot -Tsvg > dat.svg
# run on local snakemake # run on SGE cluster snakemake \ --jobs 80\ --cluster "qsub -S /bin/bash -cwd -q {queue} -P {project} -l vf={mem},p={cores} -binding linear:{cores}"
真实世界的亚基因组数据处理(未完成)
# in metapipe directory# lookcd metapi snakemake --dag | dot -Tsvg > ../docs/dat.svg
# run on local snakemake \ --cores 8\ --snakefile metapi/Snakefile \ --configfile metapi/metaconfig.yaml \ --until all # run on SGE cluster snakemake \ --snakefile metapi/Snakefile \ --configfile metapi/metaconfig.yaml \ --cluster-config metapi/metacluster.yaml \ --jobs 80\ --cluster "qsub -S /bin/bash -cwd -q {cluster.queue} -P {cluster.project} -l vf={cluster.mem},p={cluster.cores} -binding linear:{cluster.cores} -o {cluster.output} -e {cluster.error}" --latency-wait 360\ --until all
metapi命令行界面
初始化
metapi --help usage: metapi [subcommand][options] metapi, a metagenomics data process pipeline optional arguments: -h, --help show this help message and exit -v, --version print software version and exit available subcommands: init a metagenomics project initialization simulation a simulation on metagenomics data workflow a workflow on real metagenomics data
请提供样品。tsv
格式
| id fq1 fq2
|————————————————————
| s1 s1.1.fq.gz s1.2.fq.gz
| s2 s2.1.fq.gz s2.2.fq.gzpython /path/to/metapi/metapi/metapi.py init -d . -s samples.tsv -b raw -a metaspades
列表
snakemake --snakefile /path/to/metapi/metapi/Snakefile --configfile metaconfig.yaml --list
调试
snakemake --snakefile /path/to/metapi/metapi/Snakefile \ --configfile metaconfig.yaml \ -p -r -n --debug-dag \ --until checkm_lineage_wf
模拟
工作流程