Python isovar包_程序模块 - PyPI

用变异体周围组装法测定RN突变蛋白序列

isovar的Python项目详细描述

isovar

概述
python api
命令行
内部设计
其他isovar命令行工具
排序建议

概述

isovar根据癌症rnaseq数据确定突变前后的突变蛋白亚序列。

isovar的工作人员：

收集rna可以读取哪一个跨越了变异的位置，
过滤RNA读到的支持突变的内容，
将突变体读入较长的编码序列，
基于参考注释阅读的突变编码序列匹配框架，和
将直接由rna决定的编码序列翻译成突变蛋白序列。

组装的编码序列可以包含近端（生殖系和体细胞）变异，以及任何剪接改变这是由于修改了拼接信号而导致的。

python api

在下面的示例中，isovar.run_isovar返回isovar.isovarresult对象的列表。这些对象中的每一个都对应于一个单一的输入变量，并且包含关于该变量所在位置的rna证据以及为该变量组装的任何突变蛋白序列的所有信息。

fromisovarimportrun_isovarisovar_results=run_isovar(variants="cancer-mutations.vcf",alignment_file="tumor-rna.bam")# this code traverses every variant and prints the number# of RNA reads which support the alt allele for variants# which had a successfully assembled/translated protein sequenceforisovar_resultinisovar_results:# if any protein sequences were assembled from RNA# then the one with most supporting reads can be# accessed from a property called `top_protein_sequence`.ifisovar_result.top_protein_sequenceisnotNone:# print number of distinct fragments supporting the# the variant allele for this mutationprint(isovar_result.variant,isovar_result.num_alt_fragments)

也可以将isovarresult对象的集合展平为pandas数据帧：

fromisovarimportrun_isovar,isovar_results_to_dataframedf=isovar_results_to_dataframe(run_isovar(variants="cancer-mutations.vcf",alignment_file="tumor-rna.bam"))

用于收集rna读取的python api选项

要改变isovar收集和过滤rna读取的方式，可以创建您自己的isovar.readcollector类的实例，并将其传递给run\isovar

fromisovarimportrun_isovar,ReadCollector# create a custom ReadCollector to change options for how RNA reads are processedread_collector=ReadCollector(use_duplicate_reads=True,use_secondary_alignments=True,use_soft_clipped_bases=True)isovar_results=run_isovar(variants="cancer-mutations.vcf",alignment_file="tumor-rna.bam",read_collector=read_collector)

用于编码序列汇编和转换的python api选项

要改变isovar如何将rna读入编码序列，需要确定读取帧和组翻译的氨基酸序列，可以创建拥有isovar.proteinsequencecreator类的实例并将其传递给run\isovar

fromisovarimportrun_isovar,ProteinSequenceCreator# create a custom ProteinSequenceCreator to change options for how# protein sequences are assembled from RNA readsprotein_sequence_creator=ProteinSequenceCreator(# number of amino acids we're aiming for, coding sequences# might still give us a shorter sequence due to an early stop # codon or poor coverageprotein_sequence_length=30,# minimum number of reads covering each base of the coding sequencemin_variant_sequence_coverage=2,# how much of a reference transcript should a coding sequence match before# we use it to establish a reading framemin_transcript_prefix_length=20,# how many mismatches allowed between coding sequence (before the variant)# and transcript (before the variant location)max_transcript_mismatches=2,# also count mismatches after the variant location toward# max_transcript_mismatchescount_mismatches_after_variant=False,# if more than one protein sequence can be assembled for a variant# then drop any beyond this number max_protein_sequences_per_variant=1,# if set to False then coding sequence will be derived from# a single RNA read with the variant closest to its centervariant_sequence_assembly=True,# how many nucleotides must two reads overlap before they are combined# into a single coding sequencemin_assembly_overlap_size=30)isovar_results=run_isovar(variants="cancer-mutations.vcf",alignment_file="tumor-rna.bam",protein_sequence_creator=protein_sequence_creator)

`用于筛选结果的python api`

您可以使用filter\u thresholds选项，通过对象的任何数值属性过滤一个isovarresult对象集合运行isovar函数。此参数所需的值是一个字典，其键名为'min_fraction_ref_reads'或'max_num_alt_fragments'，其值是数字阈值。键开头的'min'或'max'之后的所有内容都应该是isovarresult属性的名称。有关rna读取证据的许多常用属性遵循以下模式：

{num|fraction}_{ref|alt|other}_{reads|fragments}

例如，在下面g代码过滤结果，使10个或更多的alt读数支持一个变体，并且不超过25%的片段支持ref或alt以外的等位基因。

fromisovarimportrun_isovarisovar_results=run_isovar(variants="cancer-mutations.vcf",alignment_file="tumor-rna.bam",filter_thresholds={"min_num_alt_reads":10,"max_fraction_other_fragments":0.25})forisovar_resultinisovar_results:# print each variant and whether it passed both filtersprint(isovar_result.variant,isovar_result.passes_all_filters)

未能通过一个或多个筛选器的变体不会从结果集合中排除，但它的相应值中有falsefilter_valuesdictionary属性，并且对于passes_all_filters属性将有一个false值。

如果结果集合展平为数据帧，则每个筛选器都作为列包含。

也可以通过将filter_flags传递到run_isovar来过滤布尔属性（不带数值阈值）。这些布尔值属性可以通过在属性名前面加上"not_"来进一步否定，以便'protein戋sequence戋u matches戋predicted戋effect'和'not戋protein戋sequence戋u matches戋predicted戋effect'都是筛选标志的有效名称

`命令行`

基本示例：

$ isovar  \
    --vcf somatic-variants.vcf  \
    --bam rnaseq.bam \
    --protein-sequence-length 30\
    --output isovar-results.csv

`加载变量的命令行选项`

  --vcf VCF             Genomic variants in VCF format
  
  --maf MAF             Genomic variants in TCGA's MAF format
  
  --variant CHR POS REF ALT
                        Individual variant as 4 arguments giving chromsome,
                        position, ref, and alt. Example: chr1 3848 C G. Use
                        '.' to indicate empty alleles for insertions or
                        deletions.
  
  --genome GENOME       What reference assembly your variant coordinates are
                        using. Examples: 'hg19', 'GRCh38', or 'mm9'. This
                        argument is ignored for MAF files, since each row
                        includes the reference. For VCF files, this is used if
                        specified, and otherwise is guessed from the header.
                        For variants specfied on the commandline with
                        --variant, this option is required.
  
  --download-reference-genome-data
                        Automatically download genome reference data required
                        for annotation using PyEnsembl. Otherwise you must
                        first run 'pyensembl install' for the release/species
                        corresponding to the genome used in your VCF.
  
  --json-variants JSON_VARIANTS
                        Path to Varcode.VariantCollection object serialized as
                        a JSON file.

`加载对齐肿瘤rna序列的命令行选项`

  --bam BAM             BAM file containing RNAseq reads
  
  --min-mapping-quality MIN_MAPPING_QUALITY
                        Minimum MAPQ value to allow for a read (default 1)
  
  --use-duplicate-reads
                        By default, reads which have been marked as duplicates
                        are excluded.Use this option to include duplicate
                        reads.
                        
  --drop-secondary-alignments
                        By default, secondary alignments are included in
                        reads, use this option to instead only use primary
                        alignments.

`用于编码序列程序集的命令行选项`

fromisovarimportrun_isovarisovar_results=run_isovar(variants="cancer-mutations.vcf",alignment_file="tumor-rna.bam")# this code traverses every variant and prints the number# of RNA reads which support the alt allele for variants# which had a successfully assembled/translated protein sequenceforisovar_resultinisovar_results:# if any protein sequences were assembled from RNA# then the one with most supporting reads can be# accessed from a property called `top_protein_sequence`.ifisovar_result.top_protein_sequenceisnotNone:# print number of distinct fragments supporting the# the variant allele for this mutationprint(isovar_result.variant,isovar_result.num_alt_fragments)

0
将cdna翻译成蛋白质序列的命令行选项
fromisovarimportrun_isovarisovar_results=run_isovar(variants="cancer-mutations.vcf",alignment_file="tumor-rna.bam")# this code traverses every variant and prints the number# of RNA reads which support the alt allele for variants# which had a successfully assembled/translated protein sequenceforisovar_resultinisovar_results:# if any protein sequences were assembled from RNA# then the one with most supporting reads can be# accessed from a property called `top_protein_sequence`.ifisovar_result.top_protein_sequenceisnotNone:# print number of distinct fragments supporting the# the variant allele for this mutationprint(isovar_result.variant,isovar_result.num_alt_fragments)1
用于筛选的命令行选项
fromisovarimportrun_isovarisovar_results=run_isovar(variants="cancer-mutations.vcf",alignment_file="tumor-rna.bam")# this code traverses every variant and prints the number# of RNA reads which support the alt allele for variants# which had a successfully assembled/translated protein sequenceforisovar_resultinisovar_results:# if any protein sequences were assembled from RNA# then the one with most supporting reads can be# accessed from a property called `top_protein_sequence`.ifisovar_result.top_protein_sequenceisnotNone:# print number of distinct fragments supporting the# the variant allele for this mutationprint(isovar_result.variant,isovar_result.num_alt_fragments)2
用于写入输出csv的命令行选项
fromisovarimportrun_isovarisovar_results=run_isovar(variants="cancer-mutations.vcf",alignment_file="tumor-rna.bam")# this code traverses every variant and prints the number# of RNA reads which support the alt allele for variants# which had a successfully assembled/translated protein sequenceforisovar_resultinisovar_results:# if any protein sequences were assembled from RNA# then the one with most supporting reads can be# accessed from a property called `top_protein_sequence`.ifisovar_result.top_protein_sequenceisnotNone:# print number of distinct fragments supporting the# the variant allele for this mutationprint(isovar_result.variant,isovar_result.num_alt_fragments)3
内部设计
isovar的输入是一个或多个体细胞变体调用（vcf）文件，以及一个bam文件
包含排列的肿瘤rna读取。以下对象用于在isovar中聚合信息：
locsread：isovar检查每个变异位点并提取与该位点重叠的读码，
由locusread表示。locusread表示允许基于
质量和校准标准（如MAPQ>；0），在后期丢弃
等变的。
等位基因读取：一旦过滤了LocsRead对象，它们将转换为简化的
称为等位基因读取的表示法。每个等位基因都只包含cdna序列
在之前，在处，在之后。
读证据：
重叠突变位置的一组等位基因
独特的等位基因。readevidence类型表示这些读取的分组
ref，alt和其他等位基因读取集合，其中ref读取与参考一致
序列，alt读取与给定的突变一致，而其他的读取包含所有
非ref/非alt等位基因。稍后将使用alt读取来确定
一个突变的编码序列，但是ref和其他组也被保留，以防它们是
有助于过滤。
变量序列：
包含相同突变的重叠等位基因被组装成一个较长的
序列。variantSequence对象也表示此候选编码序列
当所有等位基因读取用于创建它的对象时。
referenceContext：确定要在其中转换变量的读取帧。ntsequence，isovar公司
查看所有与位点重叠并折叠的带合奏注释的转录本
进入一个或多个对象。每个referenceContext表示
变异位点上游和{0，+1，+2}阅读框的cdna序列
已翻译。
翻译：使用areferenceContext的读取框架翻译avariantSequence
转化成蛋白质片段，用翻译表示
保护序列：
多个不同的变量序列和引用上下文可以生成相同的翻译，因此我们将那些等价的翻译对象聚合为proteinsquence
isovarresult：由于一个单一的变异位点可能已经读取了组装成多个不兼容编码序列的序列，所以一个isovarresult表示一个变异和一个或多个protect。插入序列与之关联的对象。我们通常不想处理在变异株周围检测到的每个不同序列的每个可能翻译，所以蛋白质序列是按支持片段的数量排序的，最好的蛋白质序列是容易获得的。isovarresult对象还具有许多信息性属性，如num alt_fragments，fragment_ref_reads，&c.
其他isovar命令行工具
< DL>等变蛋白序列--vcf variants.vcf--bam rna.bam
可以从rna组装的所有蛋白质序列都可以读取任何给定的变体。
等位基因计数--vcf variants.vcf--bam rna.bam
支持ref、alt和其他等位基因的所有给定变异位置的读取和片段计数。
等位基因读取--vcf variants.vcf--bam rna.bam
所有读取的序列与任何给定变体重叠。
isovar翻译——vcf variants.vcf——bam rna.bam
在任何匹配的转录本的参考框架中包含任何给定变体的任何组装cDNA序列的所有可能翻译。
isovar引用上下文——vcf variants.vcf
显示每个变体之前的所有候选引用上下文（序列和读取帧），来自重叠的引用编码转录本。
isovar variant读取--vcf variants.vcf--bam rna.bam
类似于isovar等位基因读取命令，但仅限于支持alt等位基因的读取。
等变变异序列——vcf variants.vcf——bam rna.bam
显示支持任何给定变体的所有组装cDNA编码序列。
排序建议
isovar最适合高质量/高覆盖率的mrna序列数据。
这意味着您将从>；100M对端读取中获得最佳结果。
Illumina Hiseq来自富含聚-A捕捉的图书馆。读取的次数各不相同
取决于rna降解程度和肿瘤纯度。读取长度将决定
你能恢复的最长蛋白质序列
考虑与变量重叠的读取。通过100bp读数，您将能够组装
体细胞单核苷酸变异的序列最多为199bp，因此
只需从蛋白质序列中测定66个氨基酸。如果你禁用了cdna
组装算法，则100bp读取将只能确定33个氨基酸。
标签：
the
序列
rn
sequence
variant
变异
蛋白
protein
法测定
欢迎加入QQ群-->： 979659372
                                    
推荐PyPI第三方库
d-arth
卫星图像数据集工具包
distributions88
高斯分布
pycons3rt3
用于CONS3RT资产和API调用的python3库
sciwing
现代科学文献处理框架
kaal
未提供项目说明
multiviewica
多视图ICA
aiopyql
一个快速且易于使用的asyncio ORM（Objectrelational Mapper），用于使用python在RBDMS表中执行C.R.U.D.操作
spy-probabilit
高斯分布和二项式分布
feature-formatter
将原始文本转换为可读文本的智能格式化程序
djangomethodoverride
防止攻击的包
robot-test-creator
未提供项目说明
databricksutils
防止攻击的包
currenp
Python货币管理库。
django-frontapp
未提供项目说明
hertz-scraper
赫兹汽车销售刮板

导 航 栏

                                            项目 描述
                                        

                                            版本历史
                                        

                                                下载文件
                                            
项目 链接
首页
                                    
标 签
许可证: BSD许可证（BSD 3条款）
作者信息:: 暂无
                                
                            
维护者

                                  openvax
                                
                                  hammerlab
                                
                                  iskander
                                
最新PyPI项目
italian_vip_says
UFx
vofs
fake_item_generator
NerEva
django-monologue
fio_product_attribute_strict
climailsystem
pyshape
tbb-devel
npy-append-arra
anthill.tal.macrorenderer
odoo11-addon-stock-a
uuuu
contextil
fyl_nester
appomatic_renderable
teacher
chuletas
slackbot_ce
最新Python常见问题
如何测试setup.py？
如何测试Singleton\u del\u（）方法？
如何测试slack api团队加入活动
如何测试soapweb服务的RPC？
如何测试SocketIO服务器连接（使用pytest或任何其他包）？
如何测试stanfordnlp是否在gpu上运行？
如何测试stdin脚本“live”，通常程序在后台与之交互？
如何测试tensorflow cifar10 cnn tutorial mod
如何测试Tkinter标签调用
如何测试Tornado处理程序正确调用另一个API？
如何测试unittests中引发的命名CheckConstraint或ValidationError？
如何测试URL是否被调用（从另一个线程内部）
如何测试uuuu name_uuuuuuu==“uuuuuu main_uuuuuuuuuuuuuu”以增加覆盖率
如何测试WindowStaysOnTopHint标志是否设置在windowFlags中？
如何测试X是否像list/tup那样嘎嘎作响

isovar 1.0.10

isovar的Python项目详细描述

isovar

概述

python api

用于收集rna读取的python api选项

用于编码序列汇编和转换的python api选项

`用于筛选结果的python api`

`命令行`

`加载变量的命令行选项`

`加载对齐肿瘤rna序列的命令行选项`

`用于编码序列程序集的命令行选项`

将cdna翻译成蛋白质序列的命令行选项

用于筛选的命令行选项

用于写入输出csv的命令行选项

内部设计

其他isovar命令行工具

排序建议

推荐PyPI第三方库

d-arth

distributions88

pycons3rt3

sciwing

kaal

multiviewica

aiopyql

spy-probabilit

feature-formatter

djangomethodoverride

robot-test-creator

databricksutils

currenp

django-frontapp

hertz-scraper

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

isovar 1.0.10

isovar的Python项目详细描述

isovar

概述

python api

用于收集rna读取的python api选项

用于编码序列汇编和转换的python api选项

用于筛选结果的python api

命令行

加载变量的命令行选项

加载对齐肿瘤rna序列的命令行选项

用于编码序列程序集的命令行选项

将cdna翻译成蛋白质序列的命令行选项

用于筛选的命令行选项

用于写入输出csv的命令行选项

内部设计

其他isovar命令行工具

排序建议

推荐PyPI第三方库

d-arth

distributions88

pycons3rt3

sciwing

kaal

multiviewica

aiopyql

spy-probabilit

feature-formatter

djangomethodoverride

robot-test-creator

databricksutils

currenp

django-frontapp

hertz-scraper

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

`用于筛选结果的python api`

`命令行`

`加载变量的命令行选项`

`加载对齐肿瘤rna序列的命令行选项`

`用于编码序列程序集的命令行选项`

导航栏

项目链接

标签