从Python Entrez字典返回值

# Pull the Entrez gene page for MAP1B using Biopython from Bio import Entrez Entrez.email = "jamayfie@vasci.umass.edu" handle = Entrez.efetch(db="gene", id="4131", retmode="xml") record = Entrez.read(handle) handle.close() PPI_Entrez = [] PPI_Sym = [] # Find the Dictionary that contains the Interaction table for x in range(1, len(record[0]["Entrezgene_comments"])): if ('Gene-commentary_heading', 'Interactions') in record[0]["Entrezgene_comments"][x].items(): for y in range(0, len(record[0]["Entrezgene_comments"][x]['Gene-commentary_comment'])): EntrezID = record[0]["Entrezgene_comments"][x]['Gene-commentary_comment'][y]['Gene-commentary_comment'][1]['Gene-commentary_source'][0]['Other-source_src']['Dbtag']['Dbtag_tag']['Object-id']['Object-id_id'] PPI_Entrez.append(EntrezID) Sym = record[0]["Entrezgene_comments"][x]['Gene-commentary_comment'][y]['Gene-commentary_comment'][1]['Gene-commentary_source'][0]['Other-source_anchor'] PPI_Sym.append(Sym) # Return the desired values: I want the Entrez ID and Gene symbol for each interacting protein PPI_Entrez # Returns the EntrezID PPI_Sym # Returns the gene symbol

1条回答

网友

1楼 · 发布于 2024-05-15 15:00:39

我不确定Python中的xpath，但是如果代码有效，那么我就不必担心删除完整的路径或者Entrez Gene XML是否会改变。由于您第一次尝试R，您可以使用直接下面的Entrez系统调用或像rentrez这样的包来获取XML。在

doc <- xmlParse( system("efetch -db=gene -id=4131 -format xml", intern=TRUE) )

接下来，获取表中http://www.ncbi.nlm.nih.gov/gene/4131#interactions处的行对应的节点

^{pr2}$

先试试简单的东西

xmlToDataFrame(x[1:4])

  Gene-commentary_type  Gene-commentary_text Gene-commentary_refs Gene-commentary_source                         Gene-commentary_comment
1                   18   Affinity Capture-MS             24457600   BioGRID110304BioGRID   255BioGRID110304255GeneID8726EEDBioGRID114265
2                   18 Reconstituted Complex             20195357   BioGRID110304BioGRID   255BioGRID110304255GeneID2353FOSBioGRID108636
3                   18 Reconstituted Complex             20195357   BioGRID110304BioGRID 255BioGRID110304255GeneID1936EEF1DBioGRID108256
4                   18   Affinity Capture-MS     2345592220562859   BioGRID110304BioGRID  255BioGRID110304255GeneID6789STK4BioGRID112665
  Gene-commentary_create-date Gene-commentary_update-date
1                  2014461120                201410513330
2                201312810490                201410513330
3                201312810490                201410513330
4                 20137710360                201410513330

一些标记，如text、refs、source和dates应该很容易解析

sapply(x, function(x) paste( xpathSApply(x, ".//PubMedId", xmlValue), collapse=", "))

我不确定表中列出的注释或产品、交互体和其他基因是如何存储在XML中的，但我在这里为每个节点提供了一到三个符号和三个ID。在

sapply(x, function(x) paste( xpathSApply(x, ".//Gene-commentary_comment//Other-source_anchor", xmlValue), collapse=" + "))
sapply(x, function(x) paste( xpathSApply(x, ".//Gene-commentary_comment//Object-id_id", xmlValue), collapse=" + "))

最后，因为我认为Entrez基因只是完整地复制了BioGrid，所以你也可以试试这些网站。Biogrid有一个非常简单的Rest服务，但是您必须注册一个密钥。在

url <- "http://webservice.thebiogrid.org/interactions?geneList=MAP1B&taxId=9606&includeHeader=TRUE&accesskey=[ your ACCESSKEY ]"

biogrid <- read.delim(url)
 dim(biogrid)
[1] 58 24

head(biogrid[, c(8:9,12)])
  Official.Symbol.Interactor.A Official.Symbol.Interactor.B      Experimental.System
1                       ANP32A                        MAP1B               Two-hybrid
2                        MAP1B                       ANP32A               Two-hybrid
3                       RASSF1                        MAP1B Affinity Capture-Western
4                       RASSF1                        MAP1B               Two-hybrid
5                       ANP32A                        MAP1B Affinity Capture-Western
6                          GAN                        MAP1B Affinity Capture-Western

相关问题更多 >

编程相关推荐

热门问题

热门文章