使用海底蟒蛇
merp的Python项目详细描述
在python中使用mer脚本。
(来自MER存储库)
MER是命名实体识别工具,它给出任何词典,任何输入文本都返回列表。 文本中认可的术语,包括它们的确切位置(注释)。
给定一个本体(owl文件),mer还能够将实体链接到它们的类。
有关MER的更多信息,请参见:
- mer:用于最小命名实体识别和链接的shell脚本和注释服务器,f.couto和a.lamurias,化学信息杂志,10:582018 [https://doi.org/10.1186/s13321-018-0312-9]
- MER:最小命名实体识别标记器和注释服务器,F.Couto、L.Campos和A.Lamurias,生物创造V.5挑战评估,2017年 [https://www.researchgate.net/publication/316545534_mer_a_minimal_named-entity_recognition_tagger_and_annotation_server]
依赖关系
awk
mer是使用gnu awk(gawk)和grep开发和测试的。如果你的机器里还有另一个awk解释器,就不能保证程序能正常工作。
例如,要在ubuntu上安装gnu awk:
sudo apt-get install gawk
安装
pip install merpy
或
python setup.py install
基本用法
>>>importmerpy>>>merpy.process_lexicon("hp")>>>document='Influenza, commonly known as "the flu", is an infectious disease caused by an influenza virus. Symptoms can be mild to severe. The most common symptoms include: a high fever, runny nose, sore throat, muscle pains, headache, coughing, and feeling tired'>>>entities=merpy.get_entities(document,"hp")>>>print(entities)[['111','115','mild','http://purl.obolibrary.org/obo/HP_0012825'],['119','125','severe','http://purl.obolibrary.org/obo/HP_0012828'],['168','173','fever','http://purl.obolibrary.org/obo/HP_0001945'],['214','222','headache','http://purl.obolibrary.org/obo/HP_0002315'],['224','232','coughing','http://purl.obolibrary.org/obo/HP_0012735'],['246','251','tired','http://purl.obolibrary.org/obo/HP_0012378'],['175','185','runny nose','http://purl.obolibrary.org/obo/HP_0031417']]>>>lexicons=merpy.get_lexicons()>>>merpy.show_lexicons()lexiconspreloaded:['lexicon','go','cell_line_and_cell_type','chebi_lite','chemical','hp','disease','wordnet_nouns','hpo','radlex','doid','protein','hpomultilang','tissue_and_organ','mirna','subcellular_structure']lexiconsloadedreadytouse:['lexicon','doid','hp']lexiconswithlinkedconcepts:['doid','hp','go','chebi_lite','lexicon']>>>merpy.create_lexicon(["gene1","gene2","gene3"],"genelist")wrotegenelistlexicon>>>merpy.process_lexicon("genelist")>>>merpy.download_lexicon("https://github.com/lasigeBioTM/MER/raw/biocreative2017/data/ChEBI.txt","chebi")wrotechebilexicon>>>merpy.process_lexicon("chebi")