命名实体识别(ner)的python模块。
ner-d的Python项目详细描述
内耗d
ner-d是用于命名实体识别(ner)的python模块。命名实体识别(ner)(也称为实体识别、实体分块和实体提取) 是信息提取的一个子任务,它试图定位非结构化文本中提到的命名实体并将其分类为预定义的类别,如person 名称、组织、地点、医疗代码、时间表达式、数量、货币值、百分比等。
简单地使用单一的主函数和选择语言模型的灵活性。如果之前没有下载,它会自动下载模型并在系统上链接 从给定的文本块中查找实体。先决条件
- 一个列在requirements.txt上的依赖项,将在使用pip安装时安装。
安装
使用pip:
安装模块$ pip install ner-d
从https://github.com/verifid/ner-d下载最新的ner-d库,并使用pip:
安装模块$ pip install -e .
提取源分发并运行:
$ python setup.py build $ python setup.py install
用法
- ner:
fromnerdimportnerdoc=ner.name("""GitHub launched April 10, 2008, a subsidiary of Microsoft, is an American web-based hosting service for version control using Git. It is mostly used for computer code. It offers all of the distributed version control and source code management (SCM) functionality of Git as well as adding its own features.""",language='en_core_web_sm')text_label=[(X.text,X.label_)forXindoc]print(text_label)//[(u'GitHub',u'ORG'),(u'April 10, 2008',u'DATE'),(u'Microsoft',u'ORG'),(u'American',u'NORP'),(u'Git',u'PERSON'),(u'SCM',u'ORG'),(u'Git',u'PERSON')]
cli
// Downloads language model
python -m nerd -d en_core_web_sm
// Load language model
python -m nerd -l en_core_web_sm
// Find entities from text
python -m nerd -n "GitHub launched April 10, 2008, a subsidiary of Microsoft, is an American web-based hosting service for version control using Git.
It is mostly used for computer code. It offers all of the distributed version control and source code management (SCM) functionality
of Git as well as adding its own features."