如何在Python中使用NLTK WordNet检查不完整的单词？

1 投票

1 回答

1769 浏览

数据工程师

提问于 2025-04-17 21:48

我有一组单词：

{corporal, dog, cat, distingus, Company, phone, authority, vhicule, seats, lightweight, rules, resident, expertise}

我想计算这些单词之间的语义相似度。但是我遇到了一个问题：

有些单词不完整，比如“vhicule”。我该如何忽略这些单词呢？

示例代码：Python：在NLTK中将变量传递给Wordnet Synsets方法

import nltk.corpus as corpus
import itertools as IT
import fileinput

if __name__=="__main__":
    wordnet = corpus.wordnet
    list1 = ["apple", "honey", "drinks", "flowers", "paper"]
    list2 = ["pear", "shell", "movie", "fire", "tree"]

    for word1, word2 in IT.product(list1, list2):
        #print(word1, word2)
        wordFromList1 = wordnet.synsets(word1)[0]
        wordFromList2 = wordnet.synsets(word2)[0]
        print('{w1}, {w2}: {s}'.format(
            w1 = wordFromList1.name,
            w2 = wordFromList2.name,
            s = wordFromList1.wup_similarity(wordFromList2)))

假设我把“vhicule”加到任何一个列表里。我会遇到以下错误：

IndexError: 列表索引超出范围。

我该如何利用这个错误来忽略那些在数据库中不存在的单词呢？

错误处理数据库查询自然语言处理 nltk 语义相似度 Wordnet 单词检查词汇完整性

1 个回答

你可以检查一下 nltk.corpus.wordnet.synsets(i) 是否返回了一个同义词集合的列表：

>>> from nltk.corpus import wordnet as wn
>>> x = [i.strip() for i in """corporal, dog, cat, distingus, Company, phone, authority, vhicule, seats, lightweight, rules, resident, expertise""".lower().split(",")]
>>> x
['corporal', 'dog', 'cat', 'distingus', 'company', 'phone', 'authority', 'vhicule', 'seats', 'lightweight', 'rules', 'resident', 'expertise']
>>> y = [i for i in x if len(wn.synsets(i)) > 0]
>>> y
['corporal', 'dog', 'cat', 'company', 'phone', 'authority', 'seats', 'lightweight', 'rules', 'resident', 'expertise']

还有一种更简洁的方法，就是检查 wn.synsets(i) 是否为 None：

>>> from nltk.corpus import wordnet as wn
>>> x = [i.strip() for i in """corporal, dog, cat, distingus, Company, phone, authority, vhicule, seats, lightweight, rules, resident, expertise""".lower().split(",")]
>>> x
['corporal', 'dog', 'cat', 'distingus', 'company', 'phone', 'authority', 'vhicule', 'seats', 'lightweight', 'rules', 'resident', 'expertise']
>>> [i for i in x if wn.synsets(i)]
['corporal', 'dog', 'cat', 'company', 'phone', 'authority', 'seats', 'lightweight', 'rules', 'resident', 'expertise']

回答于 2025-04-17 由 Python大师

分享举报

如何在Python中使用NLTK WordNet检查不完整的单词？

1 个回答

撰写回答