我正在做我的大学项目,我正在实现LESK算法。 到现在为止,一直都还不错。对于在Wordnet中具有语法集的单词,它可以正常工作
我想做的是使用semcor语料库(我已经下载)
代码如下:
def overlap_context(synset, sentence):
gloss = set(WordPunctTokenizer().tokenize(synset.definition()))
for example in synset.examples():
gloss.union(example)
gloss = gloss.difference(stopwords)
if isinstance(sentence, str):
sentence = set(sentence.split(" "))
elif isinstance(sentence, list):
sentence = set(sentence)
elif isinstance(sentence, set):
pass
sentence = sentence.difference(stopwords)
return len(gloss.intersection(sentence))
def lesk_algorithm(word, sentence):
best_sense = None
max_overlap = 0.0
word = wn.morphy(word) if wn.morphy(word) is not None else word
for sense in wn.synsets(word):
overlap = overlap_context(sense, sentence)
for height in sense.hyponyms():
overlap = overlap + overlap_context(height, sentence)
if overlap > max_overlap:
max_overlap = overlap
best_sense = sense
return best_sense
我已经编写了代码,只要我手动传递单词和句子,它就会工作。。 既然我在使用nltk的SemCor语料库,我怎么能使用该语料库中的50个句子呢
目前没有回答
相关问题 更多 >
编程相关推荐