通过g:profiler工具包进行功能丰富性分析等

gprofiler-official的Python项目详细描述


gprofiler

项目说明

g:Profiler的正式python 3接口 功能(go和其他)术语的丰富性分析工具包, 标识符名称空间的转换与相关生物体中orhologus基因的定位。

它对熊猫有选择性的依赖。

安装gprofiler

建议使用pip安装gprofiler

pip install gprofiler-official

传统版本

gprofiler官方的0.3.x系列与1.0.x系列不兼容。我们将主要版本号更改为 表示API中的中断更改。要安装先前版本的gprofiler-official,请使用命令

pip install gprofiler-official==0.3.5

工具:

要使用g:profiler工具包中的任何工具,请首先初始化gprofiler对象。

fromgprofilerimportGProfilergp=GProfiler(user_agent='ExampleTool',#optional user agentreturn_dataframe=True,#return pandas dataframe or plain python structures    )

G:GOST(剖面图)

fromgprofilerimportGProfilergp=GProfiler(return_dataframe=True)gp.profile(organism='hsapiens',query=['NR1H4','TRIP12','UBC','FCRL3','PLXNA3','GDNF','VPS11'])

输出:

source      native                                            name   p_value  significant                                        description  term_size  query_size  intersection_size  effective_domain_size  precision    recall    query                               parents
GO:BP  GO:0048585     negative regulation of response to stimulus  0.004229         True  "Any process that stops, prevents, or reduces ...       1610           7                  6                  17622   0.857143  0.003727  query_1  [GO:0048583, GO:0048519, GO:0050896]
GO:BP  GO:0002224            toll-like receptor signaling pathway  0.016351         True  "Any series of molecular signals generated as ...        133           7                  3                  17622   0.428571  0.022556  query_1                          [GO:0002221]
GO:BP  GO:0048486      parasympathetic nervous system development  0.026199         True  "The process whose specific outcome is the pro...         19           7                  2                  17622   0.285714  0.105263  query_1              [GO:0048483, GO:0048731]
GO:BP  GO:0034162          toll-like receptor 9 signaling pathway  0.038733         True  "Any series of molecular signals generated as ...         23           7                  2                  17622   0.285714  0.086957  query_1                          [GO:0002224]
GO:BP  GO:0002221  pattern recognition receptor signaling pathway  0.039782         True  "Any series of molecular signals generated as ...        179           7                  3                  17622   0.428571  0.016760  query_1                          [GO:0002758]
CORUM  CORUM:5669                           PlexinA3-Nrp1 complex  0.049767         True                              PlexinA3-Nrp1 complex          2           2                  1                   3620   0.500000  0.500000  query_1                       [CORUM:0000000]
CORUM  CORUM:5759                           PLXNA3-RANBPM complex  0.049767         True                              PLXNA3-RANBPM complex          2           2                  1                   3620   0.500000  0.500000  query_1                       [CORUM:0000000]
  • source是数据源的代码
  • native是本机命名空间中丰富的术语/函数类别的ID。
  • name是充实术语的可读名称,description是更长的描述(如果可用)。
  • p_value
  • term_sizequery_sizeintersection_sizeeffective_domain_size是超几何测试的参数。
  • query是查询的名称,如果在一个调用中进行了多个查询(例如gp.profile(query={'query1':['NR1H4'], 'query2':['NR1H4','TRIP12']})),则该名称非常重要

设置参数no_evidences=False将添加列intersections(注释到术语并出现在查询中的基因列表) 以及列evidences(交叉基因的go证据代码列表)

注意!参数combined通过将不同查询的结果打包在一起,显著地改变了输出结构。 例如:

gp.profile(query={'query1':['NR1H4'],'query2':['NR1H4','TRIP12']},combined=True)

输出(截断):

source      native                                               name                                     p_values                                        description  term_size query_sizes intersection_sizes  effective_domain_size                                           parents
GO:MF  GO:1902122                      chenodeoxycholic acid binding  [0.024822026073022193, 0.04964405214614093]  "Interacting selectively and non-covalently wi...          1      [1, 2]             [1, 1]                  17516                          [GO:0032052, GO:0005496]
GO:MF  GO:0035257                   nuclear hormone receptor binding                  [1.0, 0.033391754400990514]  "Interacting selectively and non-covalently wi...        154      [1, 2]             [1, 2]                  17516                          [GO:0051427, GO:0061629]
GO:MF  GO:0051427                           hormone receptor binding                   [1.0, 0.04929258983003374]  "Interacting selectively and non-covalently wi...        187      [1, 2]             [1, 2]                  17516                                      [GO:0005102]

G:转换(convert)

fromgprofilerimportGProfilergp=GProfiler(return_dataframe=True)gp.convert(organism='hsapiens',query=['NR1H4','TRIP12','UBC','FCRL3','PLXNA3','GDNF','VPS11'],target_namespace='ENTREZGENE_ACC')

输出:

incoming converted  n_incoming  n_converted    name                                        description                           namespaces    query
  NR1H4      9971           1            1   NR1H4  nuclear receptor subfamily 1 group H member 4 ...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE  query_1
 TRIP12      9320           2            1  TRIP12  thyroid hormone receptor interactor 12 [Source...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE  query_1
    UBC      7316           3            1     UBC    ubiquitin C [Source:HGNC Symbol;Acc:HGNC:12468]  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE  query_1
  FCRL3    115352           4            1   FCRL3  Fc receptor like 3 [Source:HGNC Symbol;Acc:HGN...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE  query_1
 PLXNA3     55558           5            1  PLXNA3       plexin A3 [Source:HGNC Symbol;Acc:HGNC:9101]             ENTREZGENE,HGNC,WIKIGENE  query_1
   GDNF      2668           6            1    GDNF  glial cell derived neurotrophic factor [Source...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE  query_1
  VPS11     55823           7            1   VPS11  VPS11, CORVET/HOPS core subunit [Source:HGNC S...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE  query_1
 PLXNA3     55558           5            1  PLXNA3       plexin A3 [Source:HGNC Symbol;Acc:HGNC:9101]             ENTREZGENE,HGNC,WIKIGENE  query_1

incoming列列出输入基因,converted列出目标命名空间中的基因(本例中为entrez基因登录号)。

G:ORTH(ORTH)

fromgprofilerimportGProfilergp=GProfiler(return_dataframe=True)gp.orth(organism='hsapiens',query=['NR1H4','TRIP12','UBC','FCRL3','PLXNA3','GDNF','VPS11'],target='mmusculus')

输出:

incoming        converted       ortholog_ensg  n_incoming  n_converted  n_result    name                                        description                           namespaces
  NR1H4  ENSG00000012504  ENSMUSG00000047638           1            1         1   Nr1h4  nuclear receptor subfamily 1, group H, member ...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE
 TRIP12  ENSG00000153827  ENSMUSG00000026219           2            1         1  Trip12  thyroid hormone receptor interactor 12 [Source...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE
    UBC  ENSG00000150991  ENSMUSG00000008348           3            1         1     Ubc      ubiquitin C [Source:MGI Symbol;Acc:MGI:98889]  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE
  FCRL3  ENSG00000160856                 N/A           4            1         1     N/A                                                N/A  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE
 PLXNA3  ENSG00000130827  ENSMUSG00000031398           5            1         1  Plxna3       plexin A3 [Source:MGI Symbol;Acc:MGI:107683]             ENTREZGENE,HGNC,WIKIGENE
   GDNF  ENSG00000168621  ENSMUSG00000022144           6            1         1    Gdnf  glial cell line derived neurotrophic factor [S...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE
  VPS11  ENSG00000160695  ENSMUSG00000032127           7            1         1   Vps11  VPS11, CORVET/HOPS core subunit [Source:MGI Sy...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE

incoming是输入基因,converted是输入基因的标准集合ID, ortholog_ensg是目标生物体中同源基因的典型集合ID。

g:snpense(snpense)

fromgprofilerimportGProfilergp=GProfiler(return_dataframe=True)gp.snpense(query=['rs11734132','rs7961894','rs4305276','rs17396340'])

输出:

rs_id chromosome strand      start        end              ensgs gene_names                                           variants
rs11734132                           -1         -1                 []         []  {'intron_variant': 0, 'non_coding_transcript_v...
 rs7961894         12      +  121927677  121927677  [ENSG00000158023]    [WDR66]  {'intron_variant': 3, 'non_coding_transcript_v...
 rs4305276          2      +  240555596  240555596  [ENSG00000144504]   [ANKMY1]  {'intron_variant': 57, 'non_coding_transcript_...
rs17396340          1      +   10226118   10226118  [ENSG00000054523]    [KIF1B]  {'intron_variant': 8, 'non_coding_transcript_v...

  • rs_id是输入的RS号
  • chromosomestrandstartend编码变异的位置
  • ensgsgene_names是与rs数相关的蛋白质编码基因列表。
  • variants是预测的变异效应。

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
与SpringJPA相比,JavaHazelcast内存数据网格非常慢   java如何从Netbeans从命令行创建的ant项目运行单个junit测试?   java多个eclipse概要文件   java集合存储值还是引用?   java从两个自定义对象列表中删除公共元素   java密钥库、HttpClient和HTTPS:有人能给我解释一下这段代码吗?   java使用Appengine中的域别名帐户发送电子邮件   java如何在安卓中获得用户定义的“设备名称”?   java错误:包com。太阳工具。javac。util不存在   导出后Java Eclipse项目出现问题   组织。openqa。硒。WebDriverException:java。网ConnectException:无法连接到本地主机/0:0:0:0:0:0:0   java在字符串中追加文本   java在Pig中按名称获取字段?   java如何打印播放中对象的值   Java静态修饰符对运行时性能有积极影响吗?   Java计时器   java是否值得线程化一个需要1秒才能完成的操作?