NLTK 朴素贝叶斯分类器输入格式化

1 投票

1 回答

4900 浏览

提问于 2025-04-18 16:12

我现在遇到一个问题，完全搞不懂。我对Python和NLTK还比较陌生，想做一个朴素贝叶斯分类器，但不确定输入应该是什么格式，是一组元组的列表，还是字典，或者是一个包含两个列表的元组。

我试过下面的格式，结果报错了，错误信息是 AttributeError: 'str' object has no attribute 'items'

[('maggie: just a push button. and the electric car uses sensors to drive itself. \n', 'notending')]

下面这种格式也报错，错误信息是 AttributeError: 'list' object has no attribute 'items'

[([['the', 'fire', 'chief', 'says', 'someone', 'started', 'the', 'blaze', 'on', 'purpose', 'as', 'a', 'controlled', 'burn', ',', 'but', 'it', 'quickly', 'got', 'out', 'of', 'hand', '.']], 'notending')]

如果我用字典的话，又会出现这个错误 ValueError: too many values to unpack

{'everyone: bye!': 'ending'}

我调用朴素贝叶斯分类器的代码是 classifier = nltk.NaiveBayesClassifier.train(d_train)

我不太确定哪里出了问题。非常感谢大家的帮助！

错误处理数据结构机器学习 nltk 分类器朴素贝叶斯输入格式

1 个回答

from nltk.classify import NaiveBayesClassifier
from nltk.corpus import stopwords
stopset = list(set(stopwords.words('english')))

def word_feats(words):
    return dict([(word, True) for word in words.split() if word not in stopset])

posids = ['I love this sandwich.', 'I feel very good about these beers.']
negids = ['I hate this sandwich.', 'I feel worst about these beers.']
pos_feats = [(word_feats(f), 'positive') for f in posids ]
neg_feats = [(word_feats(f), 'negative') for f in negids ]
print pos_feats
print neg_feats
trainfeats = pos_feats + neg_feats
classifier = NaiveBayesClassifier.train(trainfeats)

看看正面和负面的特征

[({'I': True, 'love': True, 'sandwich.': True}, 'positive'), ({'I': True, 'feel': True, 'good': True, 'beers.': True}, 'positive')]
[({'I': True, 'hate': True, 'sandwich.': True}, 'negative'), ({'I': True, 'feel': True, 'beers.': True, 'worst': True}, 'negative')]

所以，如果你给系统一句话'我讨厌一切'来分类

print classifier.classify(word_feats('I hate everything'))

你会得到的结果是'负面'。

回答于 2025-04-18 由 Python大师

分享举报

NLTK 朴素贝叶斯分类器输入格式化

1 个回答

撰写回答