索引错误:索引超出范围

0 投票
1 回答
512 浏览
提问于 2025-04-18 03:04

我正在使用NLTK的FreqDist对象来创建一个cPickle文件。但是,不知道为什么,我在第3行(“cutoff...”)遇到了索引超出范围的错误。

words = [item for sublist in words for item in sublist]
freq = nltk.FreqDist(words)
cutoff = scoreatpercentile(freq.values(),15)
vocab = [word for word,f in freq.items() if f > cutoff] 
cPickle.dump({'distribution':freq,'cutoff':cutoff},open('freqdist_2.pkl',WRITE))

错误信息是

File "C:\Python27\lib\site-packages\scipy\stats\stats.py", line 1419, in scoreatpercentile
score = _interpolate(values[int(idx)], values[int(idx)+1],
IndexError: index out of bounds

这段代码在其他电脑上运行得很好……我不太确定我这里缺少了什么。

1 个回答

0

在把 nltk.FreqDist(words) 的内容传给 scipy 的 scoreatpercentile 函数之前,你需要先检查一下里面有什么。

如果你想要一个更简单的方法来获取 scoreatpercentile,这里有一个例子:

from nltk.probability import FreqDist

words = "this is a foo bar bar bar bar black black sheep sentence".split()
sublist = "foo bar black sheep sentence".split()
words = [i for i in words if i in sublist]

word_freq = FreqDist(words)
cutoff = 15*sum(word_freq.values())/float(100)

vocab = [word for word,f in word_freq.items() if f > cutoff]

print vocab

[输出]:

['bar', 'black']

撰写回答