斯坦福corenlp的python包装器

pynlp的Python项目详细描述


Pynlp

Build StatusPyPI version

斯坦福大学corenlp的pythonic包装。

说明

这个库为构建在^{}之上的Stanford CoreNLP提供了一个python接口。

安装

  1. 从官方网站download page下载斯坦福corenlp。
  2. 解压缩文件并将CORE_NLP环境变量设置为指向目录。
  3. 从pip安装pynlp
pip3 install pynlp

快速启动

启动服务器

使用给定的here指令启动StanfordCoreNLPServer或者,只需运行模块。

python3 -m pynlp

默认情况下,这将使用jvm的端口9000和4gb ram启动本地主机上的服务器。使用--help选项可获取有关自定义配置的说明。

示例

让我们从一篇cnn文章的摘录开始。

text=('GOP Sen. Rand Paul was assaulted in his home in Bowling Green, Kentucky, on Friday, ''according to Kentucky State Police. State troopers responded to a call to the senator\'s ''residence at 3:21 p.m. Friday. Police arrested a man named Rene Albert Boucher, who they ''allege "intentionally assaulted" Paul, causing him "minor injury". Boucher, 59, of Bowling ''Green was charged with one count of fourth-degree assault. As of Saturday afternoon, he ''was being held in the Warren County Regional Jail on a $5,000 bond.')

实例化注释器

在这里,我们演示以下注释器:

  • annotoators:tokenize、ssplit、pos、引理、ner、entitymentions、coref、情感、quote、openie
  • 选项:openie.resolve\u coref
frompynlpimportStanfordCoreNLPannotators='tokenize, ssplit, pos, lemma, ner, entitymentions, coref, sentiment, quote, openie'options={'openie.resolve_coref':True}nlp=StanfordCoreNLP(annotators=annotators,options=options)

注释文本

nlp实例是可调用的。使用它来注释文本并返回一个Document对象。

document=nlp(text)print(document)# prints 'text'

句子拆分

让我们测试一下ssplit注释器。一个Document对象在其Sentence对象上迭代。

forindex,sentenceinenumerate(document):print(index,sentence,sep=' )')

输出:

0) GOP Sen. Rand Paul was assaulted in his home in Bowling Green, Kentucky, on Friday, according to Kentucky State Police.
1) State troopers responded to a call to the senator's residence at 3:21 p.m. Friday.
2) Police arrested a man named Rene Albert Boucher, who they allege "intentionally assaulted" Paul, causing him "minor injury".
3) Boucher, 59, of Bowling Green was charged with one count of fourth-degree assault.
4) As of Saturday afternoon, he was being held in the Warren County Regional Jail on a $5,000 bond.

命名实体识别

把文件里提到的人都找出来怎么样?

[str(entity)forentityindocument.entitiesifentity.type=='PERSON']

输出:

Out[2]: ['Rand Paul', 'Rene Albert Boucher', 'Paul', 'Boucher']

我们也可以在句子层面使用命名实体。

first_sentence=document[0]forentityinfirst_sentence.entities:print(entity,'({})'.format(entity.type))

输出:

GOP (ORGANIZATION)
Rand Paul (PERSON)
Bowling Green (LOCATION)
Kentucky (LOCATION)
Friday (DATE)
Kentucky State Police (ORGANIZATION)

词性标注

让我们在第一句话中找到所有的“vb”标记。一个Sentence对象遍历Token对象。

fortokeninfirst_sentence:if'VB'intoken.pos:print(token,token.pos)

输出:

was VBD
assaulted VBN
according VBG

元素化

用同样的词,让我们看看引理。

fortokeninfirst_sentence:if'VB'intoken.pos:print(token,'->',token.lemma)

输出:

was -> be
assaulted -> assault
according -> accord

共指结果

让我们使用pynlp来查找文本中的第一个CorefChain

chain=document.coref_chains[0]print(chain)

输出:

((GOP Sen. Rand Paul))-[id=4] was assaulted in (his)-[id=5] home in Bowling Green, Kentucky, on Friday, according to Kentucky State Police.
State troopers responded to a call to (the senator's)-[id=10] residence at 3:21 p.m. Friday.
Police arrested a man named Rene Albert Boucher, who they allege "(intentionally assaulted" Paul)-[id=16], causing him "minor injury.

在字符串表示中,coreference用括号标记,referent用双括号标记。 每一个也用coref_id标记。让我们仔细看一下参照物。

ref=chain.referentprint('Coreference: {}\n'.format(ref))forattrin'type','number','animacy','gender':print(attr,getattr(ref,attr),sep=': ')# Note that we can also index coreferences by idassertchain[4].is_referent

输出:

Coreference: Police

type: PROPER
number: SINGULAR
animacy: ANIMATE
gender: UNKNOWN

引号

从文本中提取引号很简单。

print(document.quotes)

输出:

[<Quote: "intentionally assaulted">, <Quote: "minor injury">]

TOdo(注释包装器):

  • [X]ssplit
  • []净资产
  • [X]位置
  • [X]引理
  • [X]岩芯
  • [X]引号
  • []报价.归属
  • []解析
  • []深度分析
  • [X]实体规则
  • []OpenIE
  • []情绪
  • []关系
  • []kbp
  • []实体链接
  • []选项示例,即openie.resolve\u coref

保存注释

写入

Pynlp文档可以保存为字节字符串。

withopen('annotation.dat','wb')asfile:file.write(document.to_bytes())

读取

要加载pynlp文档,请使用from_bytes类方法实例化Document

frompynlpimportDocumentwithopen('annotation.dat','rb')asfile:document=Document.from_bytes(file.read())

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java变量始终存储0值。为什么?   如何使用Java/REST将Azure blob从一个存储容器移动到另一个存储容器?   java将commons DBCP从1.2升级到1.4,我应该害怕吗?   java如何使用分隔符拆分字符串?   java使用数组读取json对象   java在groovy中切片字符串   交换数组java的两个邻域元素   java移动用于确定字符串是否为回文的逻辑   java Android应用程序在一个活动中崩溃   java Sparkjava将webapp文件夹设置为静态资源/模板的文件夹   java复杂条件表达式,用户易用。   java如何仅在表存在时从表中选择值   java I无法将数据从Recyclerview传递到其他活动   java数据结构最佳设计(大数据)   java Android从DatePickerDialogFragment中删除日历视图   java将数据从Firebase获取到片段   数组。sort()在java中运行不正常