Python HIFI-SE包_程序模块 - PyPI

HIFI-SE

HIFI-SE的Python项目详细描述

Hifi-Barcode-SE400

bgiseq-500平台已经推出了一种新的测试测序试剂盒，能够进行单端400 bp测序（se400），这为高效实现dna条形码提供了一种简单可靠的方法。本研究探讨bgiseq-500se400测序在dna条码参考构建中的应用潜力，同时提供一个更新的hifi条码软件包，可以利用长度为400bp的hts读取产生coi条码组件。

手动

manual book

版本

1.0.5版python
v1.0.5 2019-0409添加对压缩fastq的支持，修复分类错误
v1.0.4 2019-04-02修复“polish”错误，并更新bold U identification模块
v1.0.3 2018-12-14修复“trim”错误
v1.0.2 2018-12-10过滤器增加“-trim”功能；接受标签或底漆顺序不匹配，当解复用时，接受不均匀读到程序集；添加“-ds”以在装配。
v1.0.1 2018-12-2增加“波兰”功能
1.0.0版 HIFI-SE v1.0.0 2018年11月22日。以前版本的更改者：
格式化的python代码编写风格为pep8。
修正了几个小错误。
第0.0.3版 HIFI-SE v0.03 2018年11月15日。与以前版本的更改：
修改一些参数的描述，以便更好地理解。
第0.0.1版 HIFI-SE v0.0.1 2018/11/03 BEAT版本，建立框架并存档几乎全部功能。

原始Perl版本&python，原始源代码

0.expected_error.pl
1.split_extract.pl
2.hificonnect.pl

0.expected_error.py
1.split_extract.py
2.hificonnect.py

安装

系统需求和依赖性
操作系统：HIFI-SE设计用于大多数平台，包括Unix、Linux和MacOS/X。Microsoft Windows。我们已经在linux和macos/x上进行了测试，因为这些是我们开发的机器。hifi-se是用python语言编写的，需要3.5或更高版本。

依赖项：

Biopython 1.5或更高版本（必需）。请检查https://biopython.org/和https://pypi.org/project/biopython/#description以了解有关安装Biopython的更多详细信息。
另一个python包-bold_identification也是获得hifi-se完整功能所必需的。见https://pypi.org/project/bold-identification/
Hifi-SE假设您已在设备上安装了vSearch，其路径为$path。见https://github.com/torognes/vsearch

安装

我只在github上部署我的最新版本，因此您可以将此存储库克隆到本地计算机。但是，它无法解决软件包依赖性问题，因此在使用HiFi-SE软件之前，您需要安装Biopython和Bold_Identification。（注意：PIP是PIP3的链接）
```
git clone https://github.com/comery/HIFI-barcode-SE400.git
pip install biopython
pip install bold_identification  
```
建议使用pip安装，因为它将自动解决包依赖关系，包括biopython和bold U标识包。
pip install HIFI-SE

使用（最新）

python3 HIFI-SE.py

或

./HIFI-SE.py

usage: HIFI-SE [-h] [-v]
               {all,filter,assign,assembly,polish,bold_identification} ...

Description

    An automatic pipeline for HIFI-SE400 project, including filtering
    raw reads, assigning reads to samples, assembly HIFI barcodes
    (COI sequences), polished assemblies, and do tax identification.
    See more: https://github.com/comery/HIFI-barcode-SE400

Versions

    1.0.4 (20190402)

Authors

    yangchentao at genomics.cn, BGI.
    mengguanliang at genomics.cn, BGI.

positional arguments:
  {all,filter,assign,assembly,polish,bold_identification}
    all                 run filter, assign and assembly.
    filter              remove or trim reads with low quality.
    assign              assign reads to samples by tags.
    assembly            do assembly from assigned reads,
                        output raw HIFI barcodes.
    polish              polish COI barcode assemblies,
                        output confident barcodes.
    bold_identification
                        do taxa identification on BOLD system

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit

按步骤运行[筛选->；分配->；程序集]

python3 HIFI-SE.py filter

usage: HIFI-SE filter [-h] -outpre <STR> -raw <STR> [-phred <INT>] [-e <INT>]
                      [-q <INT> <INT>] [-trim] [-n <INT>]

optional arguments:
  -h, --help      show this help message and exit

common arguments:
  -outpre <STR>   prefix for output files

filter arguments:
  -raw <STR>      input raw Single-End fastq file, and only
                  adapters should be removed; supposed on
                  Phred33 score system (BGISEQ-500)
  -phred <INT>    Phred score system, 33 or 64, default=33
  -e <INT>        expected error threshod, default=10
                  see more: http://drive5.com/usearch/manual/exp_errs.html
  -q <INT> <INT>  filter by base quality; for example: '20 5' means
                  dropping read which contains more than 5 percent of
                  quality score < 20 bases.
  -trim           whether to trim 5' end of read, it adapts to -e mode
                  or -q mode
  -n <INT>        remove reads containing [INT] Ns, default=1

python3 HIFI-SE.py assign

usage: HIFI-SE assign [-h] -outpre <STR> -index INT -fq <STR> -primer <STR>
                      [-outdir <STR>] [-tmis <INT>] [-pmis <INT>]

optional arguments:
  -h, --help     show this help message and exit

common arguments:
  -outpre <STR>  prefix for output files

index arguments:
  -index INT     the length of tag sequence in the ends of primers

when only run assign arguments:
  -fq <STR>      cleaned fastq file

assign arguments:
  -primer <STR>  taged-primer list, on following format:
                 Rev001   AAGCTAAACTTCAGGGTGACCAAAAAATCA
                 For001   AAGCGGTCAACAAATCATAAAGATATTGG
                 ...
                 this format is necessary!
  -outdir <STR>  output directory for assignment,default="assigned"
  -tmis <INT>    mismatch number in tag when demultiplexing, default=0
  -pmis <INT>    mismatch number in primer when demultiplexing, default=1

python3 HIFI-SE.py assembly

usage: HIFI-SE assembly [-h] -outpre <STR> -index INT -list FILE
                        [-vsearch <STR>] [-threads <INT>] [-cid FLOAT]
                        [-min INT] [-max INT] [-oid FLOAT] [-tp INT] [-ab INT]
                        [-seqs_lim INT] [-len INT] [-ds] [-mode INT] [-rc]
                        [-codon INT] [-frame INT]

optional arguments:
  -h, --help      show this help message and exit

common arguments:
  -outpre <STR>   prefix for output files

index arguments:
  -index INT      the length of tag sequence in the ends of primers

only run assembly arguments(not all):
  -list FILE      input file, fastq file list. [required]

software path:
  -vsearch <STR>  vsearch path(only needed if vsearch is not in $PATH)
  -threads <INT>  threads for vsearch, default=2
  -cid FLOAT      identity for clustering, default=0.98

assembly arguments:
  -min INT        minimun length of overlap, default=80
  -max INT        maximum length of overlap, default=90
  -oid FLOAT      minimun similarity of overlap region, default=0.95
  -tp INT         how many clusters will be used inassembly, recommend 2
  -ab INT         keep clusters to assembly if its abundance >=INT
  -seqs_lim INT   reads number limitation. by default,
                  no limitation for input reads
  -len INT        standard read length, default=400
  -ds             drop short reads away before assembly
  -mode INT       1 or 2; modle 1 is to cluster and keep
                  most [-tp] abundance clusters, or clusters
                  abundance more than [-ab], and then make a
                  consensus sequence for each cluster.
                  modle 2 is directly to make only one consensus
                  sequence without clustering. default=1
  -rc             whether to check amino acid
                  translation for reads, default not

translation arguments(when set -rc or -cc):
  -codon INT      codon usage table used to checktranslation, default=5
  -frame INT      start codon shift for amino acidtranslation, default=1

快速启动

教程中使用的文件
所有相关文件都可以在这里找到。教程的重要文件是：
raw.fastq.gz，从bgiseq-500 se400模块生成的raw输出fastq文件。
索引的底漆列表，标记的底漆列表

运行“全部”

示例：

python3 HIFI-SE.py all -outpre hifi -trim -e 5 -raw test.raw.fastq -index 5 -primer index_primer.list -mode 1 -cid 0.98 -oid 0.95 -seqs_lim 50000 -threads 4 -tp 2

引文

这本书还没有出版，但很快就要出版了！出版后我会更新这一部分。

欢迎加入QQ群-->： 979659372

HIFI-SE 1.0.5

HIFI-SE的Python项目详细描述

Hifi-Barcode-SE400

手动

版本

原始Perl版本&python，原始源代码

安装

系统需求和依赖性
操作系统：HIFI-SE设计用于大多数平台，包括Unix、Linux和MacOS/X。Microsoft Windows。我们已经在linux和macos/x上进行了测试，因为这些是我们开发的机器。hifi-se是用python语言编写的，需要3.5或更高版本。

依赖项：

安装

使用（最新）

按步骤运行[筛选->；分配->；程序集]

快速启动

教程中使用的文件
所有相关文件都可以在这里找到。教程的重要文件是：
raw.fastq.gz，从bgiseq-500 se400模块生成的raw输出fastq文件。
索引的底漆列表，标记的底漆列表

运行“全部”

引文

推荐PyPI第三方库

DustyShock

mrcrypt

linedoll

fakeable

type_comparable

sermon

odoo12-addon-partner-identification

pytest-resource

PyPi-SemanticVer

sanepg

lizard-connector

version-demo

cli_flask

dropbox_backup

noophttp

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

HIFI-SE 1.0.5

HIFI-SE的Python项目详细描述

Hifi-Barcode-SE400

手动

版本

原始Perl版本&python，原始源代码

安装

系统需求和依赖性 操作系统：HIFI-SE设计用于大多数平台，包括Unix、Linux和MacOS/X。Microsoft Windows。我们已经在linux和macos/x上进行了测试，因为这些是我们开发的机器。hifi-se是用python语言编写的，需要3.5或更高版本。

依赖项：

安装

使用（最新）

按步骤运行[筛选->；分配->；程序集]

快速启动

教程中使用的文件 所有相关文件都可以在这里找到。教程的重要文件是：raw.fastq.gz，从bgiseq-500 se400模块生成的raw输出fastq文件。索引的底漆列表，标记的底漆列表

运行“全部”

引文

推荐PyPI第三方库

DustyShock

mrcrypt

linedoll

fakeable

type_comparable

sermon

odoo12-addon-partner-identification

pytest-resource

PyPi-SemanticVer

sanepg

lizard-connector

version-demo

cli_flask

dropbox_backup

noophttp

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

系统需求和依赖性
操作系统：HIFI-SE设计用于大多数平台，包括Unix、Linux和MacOS/X。Microsoft Windows。我们已经在linux和macos/x上进行了测试，因为这些是我们开发的机器。hifi-se是用python语言编写的，需要3.5或更高版本。

教程中使用的文件
所有相关文件都可以在这里找到。教程的重要文件是：
raw.fastq.gz，从bgiseq-500 se400模块生成的raw输出fastq文件。
索引的底漆列表，标记的底漆列表

导航栏

项目链接

标签