Python rnasamba包_程序模块 - PyPI

利用深度学习计算rna转录序列编码潜能的工具。

rnasamba的Python项目详细描述

Overview
Documentation
Installation
Download the pre-trained model
Usage
- ^{}
- ^{}
Examples
Citation

概述

rnasamba是利用神经网络分类模型计算rna序列编码潜力的工具。关于将rnasamba与其他工具进行比较的算法和基准的描述可以在我们的article中找到。

网络版

rnasamba可以通过一个最小的web界面使用，这个界面可以在https://rnasamba.lge.ibi.unicamp.br/免费在线获得。

文档

rnasamba的完整文档可以在https://apcamargo.github.io/RNAsamba/找到。

安装

安装rnasamba有两种方法：

使用PIP:

pip install rnasamba

使用conda:

conda install -c bioconda rnasamba

下载预先培训的车型

我们提供了两个hdf5文件，其中包含用人类Trascript序列训练的分类模型的权重。第一个模型（full_length_weights.hdf5）是专门用全长转录本训练的，可用于主要或专门由完整转录序列组成的数据集。第二个模型（partial_length_weights.hdf5）是用完整和截短的转录本训练的，并且在部分长度序列有显著部分的情况下更受欢迎，例如使用de novo方法组装的转录体。

这两种模型在来自不同物种的转录本中都实现了很高的分类性能（参见reference）。

您可以通过执行以下命令下载文件：

curl -O https://raw.githubusercontent.com/apcamargo/RNAsamba/master/data/full_length_weights.hdf5
curl -O https://raw.githubusercontent.com/apcamargo/RNAsamba/master/data/partial_length_weights.hdf5

如果您想训练自己的模型，可以按照Examples部分中显示的步骤进行。

用法

rnasamba提供两个命令：rnasamba-train和rnasamba-classify。

`rnasamba-train`

rnasamba-train是用于从训练数据集中训练新分类模型并将网络权重保存到hdf5文件中的命令。用户可以指定批大小（--batch_size）和训练时段数（--epochs）。用户还可以选择激活提前停止（--early_stopping），这可以减少训练时间，并有助于避免过度拟合。

usage: rnasamba-train [-h] [-s EARLY_STOPPING] [-b BATCH_SIZE] [-e EPOCHS]
                      [-v {0,1,2,3}]
                      output_file coding_file noncoding_file

Train a new classification model.

positional arguments:
  output_file           output HDF5 file containing weights of the newly
                        trained RNAsamba network.
  coding_file           input FASTA file containing sequences of protein-
                        coding transcripts.
  noncoding_file        input FASTA file containing sequences of noncoding
                        transcripts.

optional arguments:
  -h, --help            show this help message and exit
  -s EARLY_STOPPING, --early_stopping EARLY_STOPPING
                        number of epochs after lowest validation loss before
                        stopping training (a fraction of 0.1 of the training
                        set is set apart for validation and the model with the
                        lowest validation loss will be saved). (default: 0)
  -b BATCH_SIZE, --batch_size BATCH_SIZE
                        number of samples per gradient update. (default: 128)
  -e EPOCHS, --epochs EPOCHS
                        number of epochs to train the model. (default: 40)
  -v {0,1,2,3}, --verbose {0,1,2,3}
                        print the progress of the training. 0 = silent, 1 =
                        current step, 2 = progress bar, 3 = one line per
                        epoch. (default: 0)

`rnasamba-classify`

rnasamba-classify是一个命令，用于计算输入fasta文件中包含的转录本的编码潜力，并将它们分类为编码或非编码。或者，用户可以指定一个输出fasta文件（--protein_fasta），rnasamba将在其中写入预测的编码orf的翻译序列。如果提供多个权重文件，rnasamba将把它们的预测集成到一个输出中。

usage: rnasamba-classify [-h] [-p PROTEIN_FASTA] [-v {0,1}]
                         output_file fasta_file weights [weights ...]

Classify sequences from a input FASTA file.

positional arguments:
  output_file           output TSV file containing the results of the
                        classification.
  fasta_file            input FASTA file containing transcript sequences.
  weights               input HDF5 file(s) containing weights of a trained
                        RNAsamba network (if more than a file is provided, an
                        ensembling of the models will be performed).

optional arguments:
  -h, --help            show this help message and exit
  -p PROTEIN_FASTA, --protein_fasta PROTEIN_FASTA
                        output FASTA file containing translated sequences for
                        the predicted coding ORFs. (default: None)
  -v {0,1}, --verbose {0,1}
                        print the progress of the classification. 0 = silent,
                        1 = current step. (default: 0)

示例

使用从gencode:

mus musculus

rnasamba-train mouse_model.hdf5 -v 2 gencode.vM21.pc_transcripts.fa gencode.vM21.lncRNA_transcripts.fa

使用我们预先训练的模型（full_length_weights.hdf5）对序列进行分类，并将预测的蛋白质保存到fasta文件中：

rnasamba-classify -p predicted_proteins.fa classification.tsv input.fa full_length_weights.hdf5
head classification.tsv

sequence_name	coding_score	classification
ENSMUST00000054910	0.99022	coding
ENSMUST00000059648	0.84718	coding
ENSMUST00000055537	0.99713	coding
ENSMUST00000030975	0.85189	coding
ENSMUST00000050754	0.02638	noncoding
ENSMUST00000008011	0.14949	noncoding
ENSMUST00000061643	0.03456	noncoding
ENSMUST00000059704	0.89232	coding
ENSMUST00000036304	0.03782	noncoding

引文

Camargo, Antonio P., Vsevolod Sourkov, and Marcelo F. Carazzolle. "RNAsamba: coding potential assessment using ORF and whole transcript sequence information" BioRxiv (2019).

欢迎加入QQ群-->： 979659372

rnasamba 0.1.4

rnasamba的Python项目详细描述

概述

网络版

文档

安装

下载预先培训的车型

用法

`rnasamba-train`

`rnasamba-classify`

示例

引文

推荐PyPI第三方库

ztfy.zmi

bitnomon

serverless_helpers

scylla

cmcaine-cli

woodplotlib

bioinfo

memorybuffer

birdhousebuilder.recipe.thredds

keras2onnx

hihylang

priest

commondata.be

bumplus

j2docker

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

rnasamba 0.1.4

rnasamba的Python项目详细描述

概述

网络版

文档

安装

下载预先培训的车型

用法

rnasamba-train

rnasamba-classify

示例

引文

推荐PyPI第三方库

ztfy.zmi

bitnomon

serverless_helpers

scylla

cmcaine-cli

woodplotlib

bioinfo

memorybuffer

birdhousebuilder.recipe.thredds

keras2onnx

hihylang

priest

commondata.be

bumplus

j2docker

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

`rnasamba-train`

`rnasamba-classify`

导航栏

项目链接

标签