python中的变量注释

varcode的Python项目详细描述


Build StatusCoverage StatusPyPI

变量代码

varcode是一个在python中处理基因组变异数据并预测这些变异对蛋白质序列的影响的库。

安装

您可以使用pip

pip install varcode

您可以通过PyEnsembl安装所需的参考基因组数据,如下所示:

# Downloads and installs the Ensembl releases (75 and 76)
pyensembl install --release 7576

示例

importvarcode# Load TCGA MAF containing variants from theirvariants=varcode.load_maf("tcga-ovarian-cancer-variants.maf")print(variants)### <VariantCollection from 'tcga-ovarian-cancer-variants.maf' with 6428 elements>###  -- Variant(contig=1, start=69538, ref=G, alt=A, genome=GRCh37)###  -- Variant(contig=1, start=881892, ref=T, alt=G, genome=GRCh37)###  -- Variant(contig=1, start=3389714, ref=G, alt=A, genome=GRCh37)###  -- Variant(contig=1, start=3624325, ref=G, alt=T, genome=GRCh37)###  ...# you can index into a VariantCollection and get back a Variant objectvariant=variants[0]# groupby_gene_name returns a dictionary whose keys are gene names# and whose values are themselves VariantCollectionsgene_groups=variants.groupby_gene_name()# get variants which affect the TP53 geneTP53_variants=gene_groups["TP53"]# predict protein coding effect of every TP53 variant on# each transcript of the TP53 geneTP53_effects=TP53_variants.effects()print(TP53_effects)### <EffectCollection with 789 elements>### -- PrematureStop(variant=chr17 g.7574003G>A, transcript_name=TP53-001, transcript_id=ENST00000269305, effect_description=p.R342*)### -- ThreePrimeUTR(variant=chr17 g.7574003G>A, transcript_name=TP53-005, transcript_id=ENST00000420246)### -- PrematureStop(variant=chr17 g.7574003G>A, transcript_name=TP53-002, transcript_id=ENST00000445888, effect_description=p.R342*)### -- FrameShift(variant=chr17 g.7574030_7574030delG, transcript_name=TP53-001, transcript_id=ENST00000269305, effect_description=p.R333fs)### ...premature_stop_effect=TP53_effects[0]print(str(premature_stop_effect.mutant_protein_sequence))### 'MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKGEPHHELPPGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMF'print(premature_stop_effect.aa_mutation_start_offset)### 341print(premature_stop_effect.transcript)### Transcript(id=ENST00000269305, name=TP53-001, gene_name=TP53, biotype=protein_coding, location=17:7571720-7590856)print(premature_stop_effect.gene.name)### 'TP53'

如果您正在寻找快速入门指南,可以查看演示varcode简单用例的this iPython book

效果类型

Effect typeDescription
AlternateStartCodonReplace annotated start codon with alternative start codon (e.g. "ATG>CAG").
ComplexSubstitutionInsertion and deletion of multiple amino acids.
DeletionCoding mutation which causes deletion of amino acid(s).
ExonLossDeletion of entire exon, significantly disrupts protein.
ExonicSpliceSiteMutation at the beginning or end of an exon, may affect splicing.
FivePrimeUTRVariant affects 5' untranslated region before start codon.
FrameShiftTruncationA frameshift which leads immediately to a stop codon (no novel amino acids created).
FrameShiftOut-of-frame insertion or deletion of nucleotides, causes novel protein sequence and often premature stop codon.
IncompleteTranscriptCan't determine effect since transcript annotation is incomplete (often missing either the start or stop codon).
InsertionCoding mutation which causes insertion of amino acid(s).
IntergenicOccurs outside of any annotated gene.
IntragenicWithin the annotated boundaries of a gene but not in a region that's transcribed into pre-mRNA.
IntronicSpliceSiteMutation near the beginning or end of an intron but less likely to affect splicing than donor/acceptor mutations.
IntronicVariant occurs between exons and is unlikely to affect splicing.
NoncodingTranscriptTranscript doesn't code for a protein.
PrematureStopInsertion of stop codon, truncates protein.
SilentMutation in coding sequence which does not change the amino acid sequence of the translated protein.
SpliceAcceptorMutation in the last two nucleotides of an intron, likely to affect splicing.
SpliceDonorMutation in the first two nucleotides of an intron, likely to affect splicing.
StartLossMutation causes loss of start codon, likely result is that an alternate start codon will be used down-stream (possibly in a different frame).
StopLossLoss of stop codon, causes extension of protein by translation of nucleotides from 3' UTR.
SubstitutionCoding mutation which causes simple substitution of one amino acid for another.
ThreePrimeUTRVariant affects 3' untranslated region after stop codon of mRNA.

坐标系

varcode目前使用一个“基本计数,一开始”基因组坐标系来匹配ensembl注释数据库。我们计划切换到“空间计数,零开始”(interbase)坐标,因为该系统允许更统一的逻辑(插入没有特殊情况)。要了解更多关于基因组坐标系的信息,请阅读本文blog post

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
JavaJSonarray不会从SeekBar读取double   使用另一个类从Java中的2D数组打印用户输入   java ClassNotFoundException的原因   spring调用两个方法以返回Java中的不同页面   httpurlconnection Java禁止的代码错误,但浏览器错误(2)   java画布矩阵转换   java:在另一个java映射中使用“Map”作为值   java“未找到用于解密的证书”(Apache CXF,WSSecurity)   java如何查看JTable中选择的行   java在没有xmlwrappers的情况下重复xml元素序列集   java将垂直直方图打印到控制台   java Spring JDBCTemplate:构造不带特殊字符的JSON   java PayPal RestApi获取用户信息