使用nltk.wordnet.synsets的Python IF语句

2 投票

1 回答

568 浏览

提问于 2025-04-17 18:49

import nltk
from nltk import *
from nltk.corpus import wordnet as wn

output=[]
wordlist=[]

entries = nltk.corpus.cmudict.entries()

for entry in entries[:200]: #create a list of words, without the pronounciation since.pos_tag only works with a list
    wordlist.append(entry[0])

for word in nltk.pos_tag(wordlist): #create a list of nouns
    if(word[1]=='NN'):
        output.append(word[0])

for word in output:
    x = wn.synsets(word) #remove all words which does not have synsets (this is the problem)
    if len(x)<1:
        output.remove(word)

for word in output[:200]:
    print (word," ",len(wn.synsets(word)))

我想要把所有没有同义词集的词都删掉，但不知道为什么这没成功。运行程序后，我发现即使一个词的 len(wn.synsets(word)) = 0，也没有被从我的列表中删除。有人能告诉我哪里出问题了吗？

条件语句 nltk 词汇处理同义词集

1 个回答

你不能在遍历一个列表的同时删除当前的项目。下面是一个简单的例子，展示了这个问题：

In [73]: output = range(10)

In [74]: for item in output:
   ....:     output.remove(item)

你可能会期待在 output 中的所有项目都被删除。但实际上，还是有一半的项目留了下来：

In [75]: output
Out[75]: [1, 3, 5, 7, 9]

为什么你不能同时遍历和删除：

想象一下，Python 在内部使用一个计数器来记住当前项目的索引，当它通过 for-loop 时。

当计数器等于 0（第一次进入循环时），Python 执行了

output.remove(item)

好的。现在 output 中少了一个项目。但接着，Python 把计数器加到 1。所以下一个 word 的值是 output[1]，这实际上是原始列表中的第三个项目。

0  <-- first item removed
1  <-- the new output[0] ** THIS ONE GETS SKIPPED **
2  <-- the new output[1] -- gets removed on the next iteration

解决办法：

相反，你可以遍历 output 的一个副本，或者创建一个新的列表。在这种情况下，我认为创建一个新的列表更有效：

new_output = []
for word in output:
    x = wn.synsets(word) 
    if len(x)>=1:
        new_output.append(word)

回答于 2025-04-17 由 Python大师

分享举报

使用nltk.wordnet.synsets的Python IF语句

1 个回答

撰写回答