Python pynlp包_程序模块 - PyPI

斯坦福corenlp的python包装器

pynlp的Python项目详细描述

Pynlp

斯坦福大学corenlp的pythonic包装。

说明

这个库为构建在^{}之上的Stanford CoreNLP提供了一个python接口。

安装

从官方网站download page下载斯坦福corenlp。
解压缩文件并将CORE_NLP环境变量设置为指向目录。
从pip安装pynlp

pip3 install pynlp

快速启动

启动服务器

使用给定的here指令启动StanfordCoreNLPServer。或者，只需运行模块。

python3 -m pynlp

默认情况下，这将使用jvm的端口9000和4gb ram启动本地主机上的服务器。使用--help选项可获取有关自定义配置的说明。

示例

让我们从一篇cnn文章的摘录开始。

text=('GOP Sen. Rand Paul was assaulted in his home in Bowling Green, Kentucky, on Friday, ''according to Kentucky State Police. State troopers responded to a call to the senator\'s ''residence at 3:21 p.m. Friday. Police arrested a man named Rene Albert Boucher, who they ''allege "intentionally assaulted" Paul, causing him "minor injury". Boucher, 59, of Bowling ''Green was charged with one count of fourth-degree assault. As of Saturday afternoon, he ''was being held in the Warren County Regional Jail on a $5,000 bond.')

实例化注释器

在这里，我们演示以下注释器：

annotoators:tokenize、ssplit、pos、引理、ner、entitymentions、coref、情感、quote、openie
选项：openie.resolve\u coref

frompynlpimportStanfordCoreNLPannotators='tokenize, ssplit, pos, lemma, ner, entitymentions, coref, sentiment, quote, openie'options={'openie.resolve_coref':True}nlp=StanfordCoreNLP(annotators=annotators,options=options)

注释文本

nlp实例是可调用的。使用它来注释文本并返回一个Document对象。

document=nlp(text)print(document)# prints 'text'

句子拆分

让我们测试一下ssplit注释器。一个Document对象在其Sentence对象上迭代。

forindex,sentenceinenumerate(document):print(index,sentence,sep=' )')

输出：

0) GOP Sen. Rand Paul was assaulted in his home in Bowling Green, Kentucky, on Friday, according to Kentucky State Police.
1) State troopers responded to a call to the senator's residence at 3:21 p.m. Friday.
2) Police arrested a man named Rene Albert Boucher, who they allege "intentionally assaulted" Paul, causing him "minor injury".
3) Boucher, 59, of Bowling Green was charged with one count of fourth-degree assault.
4) As of Saturday afternoon, he was being held in the Warren County Regional Jail on a $5,000 bond.

命名实体识别

把文件里提到的人都找出来怎么样？

[str(entity)forentityindocument.entitiesifentity.type=='PERSON']

输出：

Out[2]: ['Rand Paul', 'Rene Albert Boucher', 'Paul', 'Boucher']

我们也可以在句子层面使用命名实体。

first_sentence=document[0]forentityinfirst_sentence.entities:print(entity,'({})'.format(entity.type))

输出：

GOP (ORGANIZATION)
Rand Paul (PERSON)
Bowling Green (LOCATION)
Kentucky (LOCATION)
Friday (DATE)
Kentucky State Police (ORGANIZATION)

词性标注

让我们在第一句话中找到所有的“vb”标记。一个Sentence对象遍历Token对象。

fortokeninfirst_sentence:if'VB'intoken.pos:print(token,token.pos)

输出：

was VBD
assaulted VBN
according VBG

元素化

用同样的词，让我们看看引理。

fortokeninfirst_sentence:if'VB'intoken.pos:print(token,'->',token.lemma)

输出：

was -> be
assaulted -> assault
according -> accord

共指结果

让我们使用pynlp来查找文本中的第一个CorefChain。

chain=document.coref_chains[0]print(chain)

输出：

((GOP Sen. Rand Paul))-[id=4] was assaulted in (his)-[id=5] home in Bowling Green, Kentucky, on Friday, according to Kentucky State Police.
State troopers responded to a call to (the senator's)-[id=10] residence at 3:21 p.m. Friday.
Police arrested a man named Rene Albert Boucher, who they allege "(intentionally assaulted" Paul)-[id=16], causing him "minor injury.

在字符串表示中，coreference用括号标记，referent用双括号标记。每一个也用coref_id标记。让我们仔细看一下参照物。

ref=chain.referentprint('Coreference: {}\n'.format(ref))forattrin'type','number','animacy','gender':print(attr,getattr(ref,attr),sep=': ')# Note that we can also index coreferences by idassertchain[4].is_referent

输出：

Coreference: Police

type: PROPER
number: SINGULAR
animacy: ANIMATE
gender: UNKNOWN

引号

从文本中提取引号很简单。

print(document.quotes)

输出：

[<Quote: "intentionally assaulted">, <Quote: "minor injury">]

TOdo（注释包装器）：

[X]ssplit
[]净资产
[X]位置
[X]引理
[X]岩芯
[X]引号
[]报价.归属
[]解析
[]深度分析
[X]实体规则
[]OpenIE
[]情绪
[]关系
[]kbp
[]实体链接
[]选项示例，即openie.resolve\u coref

保存注释

写入

Pynlp文档可以保存为字节字符串。

withopen('annotation.dat','wb')asfile:file.write(document.to_bytes())

读取

要加载pynlp文档，请使用from_bytes类方法实例化Document。

frompynlpimportDocumentwithopen('annotation.dat','rb')asfile:document=Document.from_bytes(file.read())

欢迎加入QQ群-->： 979659372

pynlp 0.4.2

pynlp的Python项目详细描述

Pynlp

说明

安装

快速启动

启动服务器

示例

实例化注释器

注释文本

句子拆分

词性标注 让我们在第一句话中找到所有的“vb”标记。一个Sentence对象遍历Token对象。fortokeninfirst_sentence:if'VB'intoken.pos:print(token,token.pos)输出：was VBD assaulted VBN according VBG

元素化 用同样的词，让我们看看引理。fortokeninfirst_sentence:if'VB'intoken.pos:print(token,'->',token.lemma)输出：was -> be assaulted -> assault according -> accord

共指结果

引号

TOdo（注释包装器）：

保存注释

写入

读取

推荐PyPI第三方库

lxdock

sass-cli

django-queryset-csv

autolink_p

bw2analyzer

daemonif

mycsv

odoo8-addon-web-dom-model-classes

djungle-sizzle

pyroutes.js

pyelt

mac-app-frontmost

pyipa

dap.plugins.netcdf

pyscrap3

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

词性标注
让我们在第一句话中找到所有的“vb”标记。一个`Sentence`对象遍历`Token`对象。
fortokeninfirst_sentence:if'VB'intoken.pos:print(token,token.pos)
输出：
`was VBD assaulted VBN according VBG`

元素化
用同样的词，让我们看看引理。
fortokeninfirst_sentence:if'VB'intoken.pos:print(token,'->',token.lemma)
输出：
`was -> be assaulted -> assault according -> accord`

导航栏

项目链接

标签