斯坦福corenlp的python包装器

pynlp的Python项目详细描述


Pynlp

Build StatusPyPI version

斯坦福大学corenlp的pythonic包装。

说明

这个库为构建在^{}之上的Stanford CoreNLP提供了一个python接口。

安装

  1. 从官方网站download page下载斯坦福corenlp。
  2. 解压缩文件并将CORE_NLP环境变量设置为指向目录。
  3. 从pip安装pynlp
pip3 install pynlp

快速启动

启动服务器

使用给定的here指令启动StanfordCoreNLPServer或者,只需运行模块。

python3 -m pynlp

默认情况下,这将使用jvm的端口9000和4gb ram启动本地主机上的服务器。使用--help选项可获取有关自定义配置的说明。

示例

让我们从一篇cnn文章的摘录开始。

text=('GOP Sen. Rand Paul was assaulted in his home in Bowling Green, Kentucky, on Friday, ''according to Kentucky State Police. State troopers responded to a call to the senator\'s ''residence at 3:21 p.m. Friday. Police arrested a man named Rene Albert Boucher, who they ''allege "intentionally assaulted" Paul, causing him "minor injury". Boucher, 59, of Bowling ''Green was charged with one count of fourth-degree assault. As of Saturday afternoon, he ''was being held in the Warren County Regional Jail on a $5,000 bond.')

实例化注释器

在这里,我们演示以下注释器:

  • annotoators:tokenize、ssplit、pos、引理、ner、entitymentions、coref、情感、quote、openie
  • 选项:openie.resolve\u coref
frompynlpimportStanfordCoreNLPannotators='tokenize, ssplit, pos, lemma, ner, entitymentions, coref, sentiment, quote, openie'options={'openie.resolve_coref':True}nlp=StanfordCoreNLP(annotators=annotators,options=options)

注释文本

nlp实例是可调用的。使用它来注释文本并返回一个Document对象。

document=nlp(text)print(document)# prints 'text'

句子拆分

让我们测试一下ssplit注释器。一个Document对象在其Sentence对象上迭代。

forindex,sentenceinenumerate(document):print(index,sentence,sep=' )')

输出:

0) GOP Sen. Rand Paul was assaulted in his home in Bowling Green, Kentucky, on Friday, according to Kentucky State Police.
1) State troopers responded to a call to the senator's residence at 3:21 p.m. Friday.
2) Police arrested a man named Rene Albert Boucher, who they allege "intentionally assaulted" Paul, causing him "minor injury".
3) Boucher, 59, of Bowling Green was charged with one count of fourth-degree assault.
4) As of Saturday afternoon, he was being held in the Warren County Regional Jail on a $5,000 bond.

命名实体识别

把文件里提到的人都找出来怎么样?

[str(entity)forentityindocument.entitiesifentity.type=='PERSON']

输出:

Out[2]: ['Rand Paul', 'Rene Albert Boucher', 'Paul', 'Boucher']

我们也可以在句子层面使用命名实体。

first_sentence=document[0]forentityinfirst_sentence.entities:print(entity,'({})'.format(entity.type))

输出:

GOP (ORGANIZATION)
Rand Paul (PERSON)
Bowling Green (LOCATION)
Kentucky (LOCATION)
Friday (DATE)
Kentucky State Police (ORGANIZATION)

词性标注

让我们在第一句话中找到所有的“vb”标记。一个Sentence对象遍历Token对象。

fortokeninfirst_sentence:if'VB'intoken.pos:print(token,token.pos)

输出:

was VBD
assaulted VBN
according VBG

元素化

用同样的词,让我们看看引理。

fortokeninfirst_sentence:if'VB'intoken.pos:print(token,'->',token.lemma)

输出:

was -> be
assaulted -> assault
according -> accord

共指结果

让我们使用pynlp来查找文本中的第一个CorefChain

chain=document.coref_chains[0]print(chain)

输出:

((GOP Sen. Rand Paul))-[id=4] was assaulted in (his)-[id=5] home in Bowling Green, Kentucky, on Friday, according to Kentucky State Police.
State troopers responded to a call to (the senator's)-[id=10] residence at 3:21 p.m. Friday.
Police arrested a man named Rene Albert Boucher, who they allege "(intentionally assaulted" Paul)-[id=16], causing him "minor injury.

在字符串表示中,coreference用括号标记,referent用双括号标记。 每一个也用coref_id标记。让我们仔细看一下参照物。

ref=chain.referentprint('Coreference: {}\n'.format(ref))forattrin'type','number','animacy','gender':print(attr,getattr(ref,attr),sep=': ')# Note that we can also index coreferences by idassertchain[4].is_referent

输出:

Coreference: Police

type: PROPER
number: SINGULAR
animacy: ANIMATE
gender: UNKNOWN

引号

从文本中提取引号很简单。

print(document.quotes)

输出:

[<Quote: "intentionally assaulted">, <Quote: "minor injury">]

TOdo(注释包装器):

  • [X]ssplit
  • []净资产
  • [X]位置
  • [X]引理
  • [X]岩芯
  • [X]引号
  • []报价.归属
  • []解析
  • []深度分析
  • [X]实体规则
  • []OpenIE
  • []情绪
  • []关系
  • []kbp
  • []实体链接
  • []选项示例,即openie.resolve\u coref

保存注释

写入

Pynlp文档可以保存为字节字符串。

withopen('annotation.dat','wb')asfile:file.write(document.to_bytes())

读取

要加载pynlp文档,请使用from_bytes类方法实例化Document

frompynlpimportDocumentwithopen('annotation.dat','rb')asfile:document=Document.from_bytes(file.read())

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
如何下载多个。java中的PDF文件   linux Java打开文件,形成实际用户主页~/   java如何在时间线内维护TableView选择?   java Hibernate注释@Where vs@WhereJoinTable   Java读/写访问异常FileNotFoundException(访问被拒绝)   继承在Java中是否可以扩展最后一个类?   Android HttpClient使用java使应用程序崩溃。lang.OutOfMemoryError:pthread_create   java为什么即使我在proguardproject中添加了jar文件,也会出现这种错误。txt?   如果添加JButton,swing Java FocusListener和KeyListener将无法工作   java使用solrj检索json格式的SolrDocument   使用Microsoft Visual Studio代码进行Java编程   java NoClassDefFoundError:org/apache/log4j/Logger   哈希集中包含相等对象的java   java中的参数化构造函数是否需要有一个主体?   java类似于NetBeans不必要的代码检测器   Java实践问题   java Blackberry“[projectname].调试文件丢失”和“I/O错误:找不到程序”jar