如何将句子分成相关词（术语提取）？

2条回答

网友

1楼 · 编辑于 2024-06-08 14:45:22

签出Spacy library（参见链接）。在

它没有现成的功能，因为您需要构建规则，但是规则是非常可读的，您可以输入许多选项（POS标记、regex、lemma或它们的任何组合，等等）

特别值得注意的是PhraseMarker()class的部分。在

直接从文档中复制的是一个代码示例：

import spacy
from spacy.matcher 
import PhraseMatcher

nlp = spacy.load('en')
matcher = PhraseMatcher(nlp.vocab)
terminology_list = ['Barack Obama', 'Angela Merkel', 'Washington, D.C.']
patterns = [nlp(text) for text in terminology_list]
matcher.add('TerminologyList', None, *patterns)

doc = nlp(u"German Chancellor Angela Merkel and US President Barack Obama "
          u"converse in the Oval Office inside the White House in Washington, D.C.")
matches = matcher(doc)

网友

2楼 · 编辑于 2024-06-08 14:45:22

要从句子流中自动检测常见短语，我建议您检查Gensim Phrase (collocation) detection

这是一个很好的例子：

bigram = Phraser(phrases)
sent = [u'the', u'mayor', u'of', u'new', u'york', u'was', u'there']
print(bigram[sent])
Output: [u'the', u'mayor', u'of', u'new_york', u'was', u'there']

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何将句子分成相关词（术语提取）？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >