在此添加简短描述!
spacy-wordnet的Python项目详细描述
Spacy wordnet
spacy wordnet是一个简单的自定义组件,用于将WordNet、MultiWordnet和WordNet domains与spaCy一起使用。
组件将NLTK wordnet interface与wordnet域结合起来,允许用户:
- 获取已处理令牌的所有语法集。例如,获取单词
bank
的所有语法集(词义)。 - 按域获取和筛选语法集。例如,获取金融领域中动词
withdraw
的同义词。
开始
spacy wordnet组件可以很容易地集成到spacy管道中。您只需要以下各项:
先决条件
- python 3.x
- 间距
您还需要安装以下NLTK WordNet数据:
python -m nltk.downloader wordnet python -m nltk.downloader omw
安装
pip install spacy-wordnet
用法
importspacyfromspacy_wordnet.wordnet_annotatorimportWordnetAnnotator# Load an spacy model (supported models are "es" and "en") nlp=spacy.load('en')nlp.add_pipe(WordnetAnnotator(nlp.lang),after='tagger')token=nlp('prices')[0]# wordnet object link spacy token with nltk wordnet interface by giving acces to# synsets and lemmas token._.wordnet.synsets()token._.wordnet.lemmas()# And automatically tags with wordnet domainstoken._.wordnet.wordnet_domains()# Imagine we want to enrich the following sentence with synonymssentence=nlp('I want to withdraw 5,000 euros')# spaCy WordNet lets you find synonyms by domain of interest# for example economyeconomy_domains=['finance','banking']enriched_sentence=[]# For each token in the sentencefortokeninsentence:# We get those synsets within the desired domainssynsets=token._.wordnet.wordnet_synsets_for_domain(economy_domains)ifsynsets:lemmas_for_synset=[]forsinsynsets:# If we found a synset in the economy domains# we get the variants and add them to the enriched sentencelemmas_for_synset.extend(s.lemma_names())enriched_sentence.append('({})'.format('|'.join(set(lemmas_for_synset))))else:enriched_sentence.append(token.text)# Let's see our enriched sentenceprint(' '.join(enriched_sentence))# >> I (need|want|require) to (draw|withdraw|draw_off|take_out) 5,000 euros