通过g:profiler工具包进行功能丰富性分析等
gprofiler-official的Python项目详细描述
gprofiler
项目说明
到g:Profiler的正式python 3接口 功能(go和其他)术语的丰富性分析工具包, 标识符名称空间的转换与相关生物体中orhologus基因的定位。
它对熊猫有选择性的依赖。
安装gprofiler
建议使用pip安装gprofiler
pip install gprofiler-official
传统版本
gprofiler官方的0.3.x
系列与1.0.x
系列不兼容。我们将主要版本号更改为
表示API中的中断更改。要安装先前版本的gprofiler-official
,请使用命令
pip install gprofiler-official==0.3.5
工具:
要使用g:profiler工具包中的任何工具,请首先初始化gprofiler对象。
fromgprofilerimportGProfilergp=GProfiler(user_agent='ExampleTool',#optional user agentreturn_dataframe=True,#return pandas dataframe or plain python structures )
G:GOST(剖面图)
fromgprofilerimportGProfilergp=GProfiler(return_dataframe=True)gp.profile(organism='hsapiens',query=['NR1H4','TRIP12','UBC','FCRL3','PLXNA3','GDNF','VPS11'])
输出:
source native name p_value significant description term_size query_size intersection_size effective_domain_size precision recall query parents
GO:BP GO:0048585 negative regulation of response to stimulus 0.004229 True "Any process that stops, prevents, or reduces ... 1610 7 6 17622 0.857143 0.003727 query_1 [GO:0048583, GO:0048519, GO:0050896]
GO:BP GO:0002224 toll-like receptor signaling pathway 0.016351 True "Any series of molecular signals generated as ... 133 7 3 17622 0.428571 0.022556 query_1 [GO:0002221]
GO:BP GO:0048486 parasympathetic nervous system development 0.026199 True "The process whose specific outcome is the pro... 19 7 2 17622 0.285714 0.105263 query_1 [GO:0048483, GO:0048731]
GO:BP GO:0034162 toll-like receptor 9 signaling pathway 0.038733 True "Any series of molecular signals generated as ... 23 7 2 17622 0.285714 0.086957 query_1 [GO:0002224]
GO:BP GO:0002221 pattern recognition receptor signaling pathway 0.039782 True "Any series of molecular signals generated as ... 179 7 3 17622 0.428571 0.016760 query_1 [GO:0002758]
CORUM CORUM:5669 PlexinA3-Nrp1 complex 0.049767 True PlexinA3-Nrp1 complex 2 2 1 3620 0.500000 0.500000 query_1 [CORUM:0000000]
CORUM CORUM:5759 PLXNA3-RANBPM complex 0.049767 True PLXNA3-RANBPM complex 2 2 1 3620 0.500000 0.500000 query_1 [CORUM:0000000]
source
是数据源的代码native
是本机命名空间中丰富的术语/函数类别的ID。name
是充实术语的可读名称,description
是更长的描述(如果可用)。p_value
是term_size
、query_size
、intersection_size
、effective_domain_size
是超几何测试的参数。query
是查询的名称,如果在一个调用中进行了多个查询(例如gp.profile(query={'query1':['NR1H4'], 'query2':['NR1H4','TRIP12']})
),则该名称非常重要
设置参数no_evidences=False
将添加列intersections
(注释到术语并出现在查询中的基因列表)
以及列evidences
(交叉基因的go证据代码列表)
注意!参数combined
通过将不同查询的结果打包在一起,显著地改变了输出结构。
例如:
gp.profile(query={'query1':['NR1H4'],'query2':['NR1H4','TRIP12']},combined=True)
输出(截断):
source native name p_values description term_size query_sizes intersection_sizes effective_domain_size parents
GO:MF GO:1902122 chenodeoxycholic acid binding [0.024822026073022193, 0.04964405214614093] "Interacting selectively and non-covalently wi... 1 [1, 2] [1, 1] 17516 [GO:0032052, GO:0005496]
GO:MF GO:0035257 nuclear hormone receptor binding [1.0, 0.033391754400990514] "Interacting selectively and non-covalently wi... 154 [1, 2] [1, 2] 17516 [GO:0051427, GO:0061629]
GO:MF GO:0051427 hormone receptor binding [1.0, 0.04929258983003374] "Interacting selectively and non-covalently wi... 187 [1, 2] [1, 2] 17516 [GO:0005102]
G:转换(convert)
fromgprofilerimportGProfilergp=GProfiler(return_dataframe=True)gp.convert(organism='hsapiens',query=['NR1H4','TRIP12','UBC','FCRL3','PLXNA3','GDNF','VPS11'],target_namespace='ENTREZGENE_ACC')
输出:
incoming converted n_incoming n_converted name description namespaces query
NR1H4 9971 1 1 NR1H4 nuclear receptor subfamily 1 group H member 4 ... ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE query_1
TRIP12 9320 2 1 TRIP12 thyroid hormone receptor interactor 12 [Source... ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE query_1
UBC 7316 3 1 UBC ubiquitin C [Source:HGNC Symbol;Acc:HGNC:12468] ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE query_1
FCRL3 115352 4 1 FCRL3 Fc receptor like 3 [Source:HGNC Symbol;Acc:HGN... ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE query_1
PLXNA3 55558 5 1 PLXNA3 plexin A3 [Source:HGNC Symbol;Acc:HGNC:9101] ENTREZGENE,HGNC,WIKIGENE query_1
GDNF 2668 6 1 GDNF glial cell derived neurotrophic factor [Source... ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE query_1
VPS11 55823 7 1 VPS11 VPS11, CORVET/HOPS core subunit [Source:HGNC S... ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE query_1
PLXNA3 55558 5 1 PLXNA3 plexin A3 [Source:HGNC Symbol;Acc:HGNC:9101] ENTREZGENE,HGNC,WIKIGENE query_1
incoming
列列出输入基因,converted
列出目标命名空间中的基因(本例中为entrez基因登录号)。
G:ORTH(ORTH)
fromgprofilerimportGProfilergp=GProfiler(return_dataframe=True)gp.orth(organism='hsapiens',query=['NR1H4','TRIP12','UBC','FCRL3','PLXNA3','GDNF','VPS11'],target='mmusculus')
输出:
incoming converted ortholog_ensg n_incoming n_converted n_result name description namespaces
NR1H4 ENSG00000012504 ENSMUSG00000047638 1 1 1 Nr1h4 nuclear receptor subfamily 1, group H, member ... ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE
TRIP12 ENSG00000153827 ENSMUSG00000026219 2 1 1 Trip12 thyroid hormone receptor interactor 12 [Source... ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE
UBC ENSG00000150991 ENSMUSG00000008348 3 1 1 Ubc ubiquitin C [Source:MGI Symbol;Acc:MGI:98889] ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE
FCRL3 ENSG00000160856 N/A 4 1 1 N/A N/A ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE
PLXNA3 ENSG00000130827 ENSMUSG00000031398 5 1 1 Plxna3 plexin A3 [Source:MGI Symbol;Acc:MGI:107683] ENTREZGENE,HGNC,WIKIGENE
GDNF ENSG00000168621 ENSMUSG00000022144 6 1 1 Gdnf glial cell line derived neurotrophic factor [S... ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE
VPS11 ENSG00000160695 ENSMUSG00000032127 7 1 1 Vps11 VPS11, CORVET/HOPS core subunit [Source:MGI Sy... ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE
incoming
是输入基因,converted
是输入基因的标准集合ID,
ortholog_ensg
是目标生物体中同源基因的典型集合ID。
g:snpense(snpense)
fromgprofilerimportGProfilergp=GProfiler(return_dataframe=True)gp.snpense(query=['rs11734132','rs7961894','rs4305276','rs17396340'])
输出:
rs_id chromosome strand start end ensgs gene_names variants
rs11734132 -1 -1 [] [] {'intron_variant': 0, 'non_coding_transcript_v...
rs7961894 12 + 121927677 121927677 [ENSG00000158023] [WDR66] {'intron_variant': 3, 'non_coding_transcript_v...
rs4305276 2 + 240555596 240555596 [ENSG00000144504] [ANKMY1] {'intron_variant': 57, 'non_coding_transcript_...
rs17396340 1 + 10226118 10226118 [ENSG00000054523] [KIF1B] {'intron_variant': 8, 'non_coding_transcript_v...
rs_id
是输入的RS号chromosome
、strand
、start
和end
编码变异的位置ensgs
和gene_names
是与rs数相关的蛋白质编码基因列表。variants
是预测的变异效应。