ValueError:解包值过多 scikitlearn.train

0 投票
1 回答
1043 浏览
提问于 2025-04-18 06:22

我现在正在做情感分析,想测试一些分类器的准确性。如果我没有把训练集转换成字典,就会出现错误:"AttributeError: 'tuple' object has no attribute 'iterkeys'"。但是在我把它转换成字典后,又出现了另一个错误:

Traceback (most recent call last):
  File "E:\Python27\accuracy.py", line 204, in <module>
    print 'BernoulliNB`s accuracy is %f' %score(BernoulliNB())
  File "E:\Python27\accuracy.py", line 200, in score
    classifier.train(trainset)
    File "E:\Python27\lib\site-packages\nltk\classify\scikitlearn.py", line 93, in train
        for fs, label in labeled_featuresets:
    ValueError: too many values to unpack

部分代码:

trainset = extracted_pos_features[50:]+extracted_neg_features[50:]
testset = extracted_pos_features[:50]+extracted_neg_features[:50]
dict1 = {}
for i,j in trainset:
    dict1.setdefault(j,[]).append(i)

trainset = dict1

test, tag_test = zip(*testset)

def score(classifier):
    classifier = SklearnClassifier(classifier)
    classifier.train(trainset)
    pred = classifier.batch_classify(test)
    return accuracy_score(tag_test, pred)

print 'BernoulliNB`s accuracy is %f' %score(BernoulliNB())

在字典dict1中,有两个键'neg'和'pos',每个键都有多个值:

dict1

{'neg': [('tone', 'ultimately'), ('tragedy', 'core'), ('ultimately', 'dulls'), ('update', 'dreary'), ('version', 'looks'), ('voice', 'lack'), ('worst', 'film'), ('yarn', 'eloquent'), ('makes', 'little'), ('makes', 'maryam'), ('remain', 'true'), ('screen', 'time'), ('sluggish', 'time'), ('thesis', 'makes'), ('time', 'machine'), ('true', 'chan'), ('true', 'original'), ('unashamedly', 'makes'), ('time', 'true')], 

'pos': [('rock', 'destined'), ('schwarzenegger', 'van'), ('screenplay', 'curls'), ('segal', 'gorgeously'), ('slice', 'asian'), ('snappy', 'screenplay'), ('somehow', 'pulls'), ('sometimes', 'movies'), ('splash', 'arnold'), ('start', 'emerges'), ('steers', 'snappy'), ('steven', 'segal'), ('top', 'game'), ('trilogy', 'huge'), ('van', 'damme'), ('vision', 'effective'), ('wasabi', 'start'), ('words', 'adequately'), ('cat', 'offers'), ('emerges', 'rare'), ('game', 'offers'), ('offers', 'refreshingly'), ('rare', 'combination'), ('rare', 'issue'), ('offers', 'rare')]}

有没有人知道该怎么解决这个问题?非常感谢。

1 个回答

0

这是我在使用字典时常犯的错误,忘记在列表上使用 items() 方法:

dct = {"aaa": 11, "bbb: 22, "ccc": 33}

for key, val in dct.items():
    print "key", key
    print "val", val

如果不使用 items(),迭代器会直接返回键,而试图把它当作一个列表来用。

在你的情况下,它试图把键(一个字符串)当作字符列表来处理,而你的字符串不总是只有两个字符,所以它有不同数量的项(字符),无法拆分成两个变量 fs, labels

撰写回答