<p>斯坦福大学的CoreNLP现在有一个名为StanfordNLP的<a href="https://github.com/stanfordnlp/stanfordnlp" rel="nofollow noreferrer">official Python binding</a>,您可以在<a href="https://stanfordnlp.github.io/stanfordnlp/#get-started" rel="nofollow noreferrer">StanfordNLP website</a>中阅读。</p>
<p>原生API <a href="https://stanfordnlp.github.io/stanfordnlp/processors.html" rel="nofollow noreferrer">doesn't seem</a>支持coref处理器,但是您可以使用CoreNLPClient接口从Python调用“标准”CoreNLP(原始Java软件)。</p>
<p>因此,在按照说明设置Python包装器<a href="https://stanfordnlp.github.io/stanfordnlp/#get-started" rel="nofollow noreferrer">here</a>之后,您可以得到这样的coreference链:</p>
<pre class="lang-py prettyprint-override"><code>from stanfordnlp.server import CoreNLPClient
text = 'Barack was born in Hawaii. His wife Michelle was born in Milan. He says that she is very smart.'
print(f"Input text: {text}")
# set up the client
client = CoreNLPClient(properties={'annotators': 'coref', 'coref.algorithm' : 'statistical'}, timeout=60000, memory='16G')
# submit the request to the server
ann = client.annotate(text)
mychains = list()
chains = ann.corefChain
for chain in chains:
mychain = list()
# Loop through every mention of this chain
for mention in chain.mention:
# Get the sentence in which this mention is located, and get the words which are part of this mention
# (we can have more than one word, for example, a mention can be a pronoun like "he", but also a compound noun like "His wife Michelle")
words_list = ann.sentence[mention.sentenceIndex].token[mention.beginIndex:mention.endIndex]
#build a string out of the words of this mention
ment_word = ' '.join([x.word for x in words_list])
mychain.append(ment_word)
mychains.append(mychain)
for chain in mychains:
print(' <-> '.join(chain))
</code></pre>