索引错误:索引超出范围
我正在使用NLTK的FreqDist对象来创建一个cPickle文件。但是,不知道为什么,我在第3行(“cutoff...”)遇到了索引超出范围的错误。
words = [item for sublist in words for item in sublist]
freq = nltk.FreqDist(words)
cutoff = scoreatpercentile(freq.values(),15)
vocab = [word for word,f in freq.items() if f > cutoff]
cPickle.dump({'distribution':freq,'cutoff':cutoff},open('freqdist_2.pkl',WRITE))
错误信息是
File "C:\Python27\lib\site-packages\scipy\stats\stats.py", line 1419, in scoreatpercentile
score = _interpolate(values[int(idx)], values[int(idx)+1],
IndexError: index out of bounds
这段代码在其他电脑上运行得很好……我不太确定我这里缺少了什么。
1 个回答
0
在把 nltk.FreqDist(words)
的内容传给 scipy 的 scoreatpercentile
函数之前,你需要先检查一下里面有什么。
如果你想要一个更简单的方法来获取 scoreatpercentile,这里有一个例子:
from nltk.probability import FreqDist
words = "this is a foo bar bar bar bar black black sheep sentence".split()
sublist = "foo bar black sheep sentence".split()
words = [i for i in words if i in sublist]
word_freq = FreqDist(words)
cutoff = 15*sum(word_freq.values())/float(100)
vocab = [word for word,f in word_freq.items() if f > cutoff]
print vocab
[输出]:
['bar', 'black']