要检查genbank或fasta文件(cds)中的内部停止密码子,请用nnn替换内部停止密码子。
polish-genbank的Python项目详细描述
波兰银行
1简介
见https://github.com/linzhi2013/polish_genbank
。
这个程序包检查genbank或fasta文件(cds)中的内部停止密码子,然后 用NNN替换内部停止密码子
2安装
pip3 install polish_genbank
将在与您的pip3
命令相同的目录下创建命令polish_genbank
3用法
运行polish_genbank
usage: polish_genbank.py [-h] --in <file> [--format {gb,fa}] [--table <int>]
[--ntNs <str>] [--aaNs <str>] --out <file>
Check for the internal stop codon, then substitute the internal stop codon
with NNN. By mengguanliang [] genomics.cn, where [] == @. See
https://github.com/linzhi2013/polish_genbank
optional arguments:
-h, --help show this help message and exit
--in <file> input genbank file or CDS file (fasta format)
--format {gb,fa} the input file format. For fasta file, all sequences are
assumed to be forward strand, coding from +1 position [gb]
--table <int> The genetic code table used for translation, for fasta
input only [2]
--ntNs <str> the chars used for substituting an internal stop codon in
CDS sequence. [NNN]
--aaNs <str> the chars used for substituting an internal stop codon in
protein sequence. [X]
--out <file> output filename
4用于脚本
In [1]: from polish_genbank import polish_gb, polish_fasta
In [2]: polish_gb?
Signature: polish_gb(ingb=None, NewInternalStopCodonNT='NNN', NewInternalStopCodonAA='X', logger=None)
Docstring:
Replace the internal stop codon with NNNs on Genbank nt sequence,
and replace the '*' in 'translation' tag (protein sequence) with 'X'
Return:
An generator.
Usage:
>>> records = polish_gb(ingb='in.gb', NewInternalStopCodonNT='NNN',
NewInternalStopCodonAA='X')
>>> for rec in records:
>>> print(rec.id, rec.seq)
In [3]: polish_fasta?
Signature: polish_fasta(infasta=None, NewInternalStopCodonNT='NNN', table=2, logger=None)
Docstring:
Replace the internal stop codon with NNNs.
The infasta file is assumed to be CDS sequences, and coding from +1
position.
Return:
An generator.
Usage:
>>> records = polish_fasta(infasta='myfile', NewInternalStopCodonNT='NNN', table=2)
>>> for rec in records:
>>> print(rec.id, rec.seq)
5条引文
目前我没有计划发布polish_genbank
但是,由于polish_genbank
使用Biopython
,如果在工作中使用breakSeqInNs_then_translate
,也应该引用它:
Peter J. A. Cock, Tiago Antao, Jeffrey T. Chang, Brad A. Chapman, Cymon J. Cox, Andrew Dalke, Iddo Friedberg, Thomas Hamelryck, Frank Kauff, Bartek Wilczynski, Michiel J. L. de Hoon: “Biopython: freely available Python tools for computational molecular biology and bioinformatics”. Bioinformatics 25 (11), 1422–1423 (2009). https://doi.org/10.1093/bioinformatics/btp163
有关详细信息,请转到http://www.biopython.org/
。