短读组装原核基因组的内部bfssi包
ProkaryoteAssembl的Python项目详细描述
原核生物组装
两个简单的脚本组装原核基因组使用成对的末端读取。
管道概述
- qc on reads with bbduk.sh(适配器微调/质量过滤)
- 用tadpole.sh纠正读取错误
- 用skesa汇编reads
- 使用bbmap.sh将错误更正的读数与草稿程序集对齐
- 用pilon抛光总成
安装
pip install ProkaryoteAssembly
用法
第一个脚本prokaryote_assemble.py
一次操作一个样本。
Usage: prokaryote_assemble.py [OPTIONS]
Options:
-1, --fwd_reads PATH Path to forward reads (R1) (gzipped FASTQ).
[required]
-2, --rev_reads PATH Path to reverse reads (R2) (gzipped FASTQ).
[required]
-o, --out_dir PATH Root directory to store all output files. [required]
-m, --memory TEXT Amount of memory to allocate to job. e.g. "8g".
Defaults to 8g.
--cleanup Specify this flag to delete all intermediary files
except the resulting FASTA assembly.
--version Specify this flag to print the version and exit.
--help Show this message and exit.
第二个脚本prokaryote_assemble_dir.py
将检测
一个目录,并在它可以配对的每个示例上运行程序集管道。
Usage: prokaryote_assemble_dir.py [OPTIONS]
Options:
-i, --input_dir PATH Directory containing all *.fastq.gz files to
assemble.NOTE: Files must be gzipped in order to be
detected. [required]
-o, --out_dir PATH Root directory to store all output files. [required]
-f, --fwd_id TEXT Pattern to detect forward reads. Defaults to "_R1".
-r, --rev_id TEXT Pattern to detect reverse reads. Defaults to "_R2".
-m, --memory TEXT Memory to allocate to pilon call. Defaults to 8g (i.e.
pilon -Xmx8g). May need to provide a large amount of
memory for large read sets/assemblies.
--cleanup Specify this flag to delete all intermediary files
except the resulting FASTA assembly.
--version Specify this flag to print the version and exit.
--help Show this message and exit.
python(3.6)依赖关系
- 单击
外部依赖性
注意:所有外部依赖项都必须通过路径可用。
确认有效的版本在括号中。
- skesa(skesa v.2.1-svn_551987:557549m)
- BBMap(bbmap版本38.22)
- samtools(使用htslib 1.8的samtools 1.8)
- pilon(Pilon版本1.22)
注意: 强烈建议通过Conda安装Pilon,例如 https://bioconda.github.io/recipes/pilon/README.html
conda install pilon