HIFI-SE
HIFI-SE的Python项目详细描述
Hifi-Barcode-SE400
bgiseq-500平台已经推出了一种新的测试测序试剂盒,能够进行单端400 bp测序(se400),这为高效实现dna条形码提供了一种简单可靠的方法。本研究探讨bgiseq-500se400测序在dna条码参考构建中的应用潜力,同时提供一个更新的hifi条码软件包,可以利用长度为400bp的hts读取产生coi条码组件。
手动
版本
1.0.5版python
- v1.0.5 2019-0409添加对压缩fastq的支持,修复分类错误
- v1.0.4 2019-04-02修复“polish”错误,并更新bold U identification模块
- v1.0.3 2018-12-14修复“trim”错误
- v1.0.2 2018-12-10过滤器增加“-trim”功能;
接受标签或底漆顺序不匹配,
当解复用时,接受不均匀读到
程序集;添加“-ds”以在
装配。
- v1.0.1 2018-12-2增加“波兰”功能
- 1.0.0版
HIFI-SE v1.0.0 2018年11月22日。以前版本的更改者:
- 格式化的python代码编写风格为pep8。
- 修正了几个小错误。
- 第0.0.3版
HIFI-SE v0.03 2018年11月15日。与以前版本的更改:
- 修改一些参数的描述,以便更好地理解。
- 第0.0.1版
HIFI-SE v0.0.1 2018/11/03 BEAT版本,建立框架并存档几乎全部功能。
原始Perl版本&python,原始源代码
0.expected_error.pl
1.split_extract.pl
2.hificonnect.pl
0.expected_error.py
1.split_extract.py
2.hificonnect.py
安装
系统需求和依赖性
- 格式化的python代码编写风格为pep8。
- 修正了几个小错误。
- 修改一些参数的描述,以便更好地理解。
0.expected_error.pl
1.split_extract.pl
2.hificonnect.pl
0.expected_error.py
1.split_extract.py
2.hificonnect.py
操作系统:HIFI-SE设计用于大多数平台,包括Unix、Linux和MacOS/X。Microsoft Windows。我们已经在linux和macos/x上进行了测试,因为这些是我们开发的机器。hifi-se是用python语言编写的,需要3.5或更高版本。
依赖项:
- Biopython 1.5或更高版本(必需)。请检查https://biopython.org/和https://pypi.org/project/biopython/#description以了解有关安装Biopython的更多详细信息。
- 另一个python包-bold_identification也是获得hifi-se完整功能所必需的。见https://pypi.org/project/bold-identification/
- Hifi-SE假设您已在设备上安装了vSearch,其路径为$path。见https://github.com/torognes/vsearch
安装
我只在github上部署我的最新版本,因此您可以将此存储库克隆到本地计算机。但是,它无法解决软件包依赖性问题,因此在使用HiFi-SE软件之前,您需要安装Biopython和Bold_Identification。(注意:PIP是PIP3的链接)
git clone https://github.com/comery/HIFI-barcode-SE400.git pip install biopython pip install bold_identification
建议使用pip安装,因为它将自动解决包依赖关系,包括biopython和bold U标识包。
pip install HIFI-SE
使用(最新)
python3 HIFI-SE.py
或
./HIFI-SE.py
usage: HIFI-SE [-h] [-v] {all,filter,assign,assembly,polish,bold_identification} ... Description An automatic pipeline for HIFI-SE400 project, including filtering raw reads, assigning reads to samples, assembly HIFI barcodes (COI sequences), polished assemblies, and do tax identification. See more: https://github.com/comery/HIFI-barcode-SE400 Versions 1.0.4 (20190402) Authors yangchentao at genomics.cn, BGI. mengguanliang at genomics.cn, BGI. positional arguments: {all,filter,assign,assembly,polish,bold_identification} all run filter, assign and assembly. filter remove or trim reads with low quality. assign assign reads to samples by tags. assembly do assembly from assigned reads, output raw HIFI barcodes. polish polish COI barcode assemblies, output confident barcodes. bold_identification do taxa identification on BOLD system optional arguments: -h, --help show this help message and exit -v, --version show program's version number and exit
按步骤运行[筛选->;分配->;程序集]
python3 HIFI-SE.py filter
usage: HIFI-SE filter [-h] -outpre <STR> -raw <STR> [-phred <INT>] [-e <INT>] [-q <INT> <INT>] [-trim] [-n <INT>] optional arguments: -h, --help show this help message and exit common arguments: -outpre <STR> prefix for output files filter arguments: -raw <STR> input raw Single-End fastq file, and only adapters should be removed; supposed on Phred33 score system (BGISEQ-500) -phred <INT> Phred score system, 33 or 64, default=33 -e <INT> expected error threshod, default=10 see more: http://drive5.com/usearch/manual/exp_errs.html -q <INT> <INT> filter by base quality; for example: '20 5' means dropping read which contains more than 5 percent of quality score < 20 bases. -trim whether to trim 5' end of read, it adapts to -e mode or -q mode -n <INT> remove reads containing [INT] Ns, default=1
python3 HIFI-SE.py assign
usage: HIFI-SE assign [-h] -outpre <STR> -index INT -fq <STR> -primer <STR> [-outdir <STR>] [-tmis <INT>] [-pmis <INT>] optional arguments: -h, --help show this help message and exit common arguments: -outpre <STR> prefix for output files index arguments: -index INT the length of tag sequence in the ends of primers when only run assign arguments: -fq <STR> cleaned fastq file assign arguments: -primer <STR> taged-primer list, on following format: Rev001 AAGCTAAACTTCAGGGTGACCAAAAAATCA For001 AAGCGGTCAACAAATCATAAAGATATTGG ... this format is necessary! -outdir <STR> output directory for assignment,default="assigned" -tmis <INT> mismatch number in tag when demultiplexing, default=0 -pmis <INT> mismatch number in primer when demultiplexing, default=1
python3 HIFI-SE.py assembly
usage: HIFI-SE assembly [-h] -outpre <STR> -index INT -list FILE
[-vsearch <STR>] [-threads <INT>] [-cid FLOAT]
[-min INT] [-max INT] [-oid FLOAT] [-tp INT] [-ab INT]
[-seqs_lim INT] [-len INT] [-ds] [-mode INT] [-rc]
[-codon INT] [-frame INT]
optional arguments:
-h, --help show this help message and exit
common arguments:
-outpre <STR> prefix for output files
index arguments:
-index INT the length of tag sequence in the ends of primers
only run assembly arguments(not all):
-list FILE input file, fastq file list. [required]
software path:
-vsearch <STR> vsearch path(only needed if vsearch is not in $PATH)
-threads <INT> threads for vsearch, default=2
-cid FLOAT identity for clustering, default=0.98
assembly arguments:
-min INT minimun length of overlap, default=80
-max INT maximum length of overlap, default=90
-oid FLOAT minimun similarity of overlap region, default=0.95
-tp INT how many clusters will be used inassembly, recommend 2
-ab INT keep clusters to assembly if its abundance >=INT
-seqs_lim INT reads number limitation. by default,
no limitation for input reads
-len INT standard read length, default=400
-ds drop short reads away before assembly
-mode INT 1 or 2; modle 1 is to cluster and keep
most [-tp] abundance clusters, or clusters
abundance more than [-ab], and then make a
consensus sequence for each cluster.
modle 2 is directly to make only one consensus
sequence without clustering. default=1
-rc whether to check amino acid
translation for reads, default not
translation arguments(when set -rc or -cc):
-codon INT codon usage table used to checktranslation, default=5
-frame INT start codon shift for amino acidtranslation, default=1
快速启动
教程中使用的文件
所有相关文件都可以在这里找到。教程的重要文件是:
- raw.fastq.gz,从bgiseq-500 se400模块生成的raw输出fastq文件。
- 索引的底漆列表,标记的底漆列表
运行“全部”
示例:
python3 HIFI-SE.py all -outpre hifi -trim -e 5 -raw test.raw.fastq -index 5 -primer index_primer.list -mode 1 -cid 0.98 -oid 0.95 -seqs_lim 50000 -threads 4 -tp 2
引文
这本书还没有出版,但很快就要出版了!出版后我会更新这一部分。