使用biomart webservice的python api
biomart的Python项目详细描述
使用biomart web服务的python api。
它将做什么:
- 显示biomart服务器的所有数据库
- 显示Biomart数据库的所有数据集
- 显示biomart数据集的属性和筛选器
- 以python dict格式运行查询,并以tsv格式返回biomart响应。
它不会做什么:
- 以json、xml等格式处理并返回结果。
用法
导入biomart模块
from biomart import BiomartServer
连接到Biomart服务器
server = BiomartServer( "http://www.biomart.org/biomart" ) # if you are behind a proxy import os server.http_proxy = os.environ.get('http_proxy', 'http://my_http_proxy.org') # set verbose to True to get some messages server.verbose = True
与Biomart服务器交互
# show server databases server.show_databases() # uses pprint behind the scenes # show server datasets server.show_datasets() # uses pprint behind the scenes # use the 'uniprot' dataset uniprot = server.datasets['uniprot'] # show all available filters and attributes of the 'uniprot' dataset uniprot.show_filters() # uses pprint uniprot.show_attributes() # uses pprint
运行搜索
# run a search with the default attributes - equivalent to hitting "Results" on the web interface. # this will return a lot of data. response = uniprot.search() response = uniprot.search( header = 1 ) # if you need the columns header # response format is TSV for line in response.iter_lines(): line = line.decode('utf-8') print(line.split("\t")) # run a count - equivalent to hitting "Count" on the web interface response = uniprot.count() print(response.text) # run a search with custom filters and default attributes. response = uniprot.search({ 'filters': { 'accession': 'Q9FMA1' } }, header = 1 ) response = uniprot.search({ 'filters': { 'accession': ['Q9FMA1', 'Q8LFJ9'] # ID-list specified accessions } }, header = 1 ) # run a search with custom filters and attributes (no header) response = uniprot.search({ 'filters': { 'accession': ['Q9FMA1', 'Q8LFJ9'] }, 'attributes': [ 'accession', 'protein_name' ] })
快捷函数:直接连接到biomart数据集 这段代码很短,但可能需要很长的时间,因为模块需要获取所有服务器的数据库才能找到数据集。
from biomart import BiomartDataset interpro = BiomartDataset( "http://www.biomart.org/biomart", name = 'entry' ) response = interpro.search({ 'filters': { 'entry_id': 'IPR027603' }, 'attributes': [ 'entry_name', 'abstract' ] })