我在运行实体提取函数时遇到问题。我相信这是版本的不同。下面的工作示例在2.0.4中运行,但不在3.0中运行。我确实更改了一个函数调用:batch\u neu chunk为:不,不,大块以防止在3.0中引发错误。
def package_get_entities(self,text):
#text = text[0:300]
entity_names = []
chunked = self.get_chunked_sentences(text)
for tree in chunked:
entity_names.extend(self.extract_entity_names(tree))
entity_names = list(set(entity_names))
return entity_names
def get_chunked_sentences(self,text):
sentences = nltk.sent_tokenize(text)
tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]
chunked_sentences = nltk.ne_chunk_sents(tagged_sentences, binary=True)
return chunked_sentences
def extract_entity_names(self,t):
entity_names = []
if hasattr(t, 'node') and t.node:
if t.node == 'NE':
entity_names.append(' '.join([child[0] for child in t]))
else:
for child in t:
entity_names.extend(self.extract_entity_names(child))
return entity_names
运行函数:
^{pr2}$在2.0.4输出中[Abraham Lincoln] 在3.0输出中[]
我不得不重写:
收件人:
^{pr2}$相关问题 更多 >
编程相关推荐