我以前在stack上见过这个问题,但是解决方案对我不起作用,请参见Save and Load testing classify Naive Bayes Classifier in NLTK in another method。当我加载pickled分类器而不是在同一个程序中进行训练和分类时,为什么我的准确度相差甚远。第一个代码块调用pickled分类器,第二个代码块一起进行所有的训练和分类。第二种方法的准确度为99%,而第一种方法的准确度为81%。。。在
Academic_classifier= None
Academic_classifier=pickle.load(open('Academic_classifier.pickle','rb'))
tweets=[]
readdata=csv.reader(open('C:\Users\Troy\Documents\Data\Gold_test.csv','r'))
for row in readdata:
tweets.append(row)
Header = tweets[0]
tweets.pop(0)
Academic_test_tweets=tweets[:]
Tweets=[]
for (words, sentiment) in tweets:
bigram=[]
bigram_list=[]
words_filtered = [e.lower() for e in WordPunctTokenizer().tokenize(words) if len(e) >= 3]
words_filtered=[re.sub(r'(.)\1+', r'\1\1', e) for e in words_filtered if len(e)>=3]
bigram_words=bigrams(words_filtered)
for x in bigram_words:
bigram.append(x)
for (bi) in bigram:
bigram_word=bi[0]+bi[1]
bigram_list.append(bigram_word)
list_to_append=words_filtered+bigram_list
Tweets.append((list_to_append, sentiment))
Academic_test_tweets_words=Tweets[:]
word_features = get_word_features(get_words_in_tweets(Academic_test_tweets_words))
Academic_test_set = nltk.classify.apply_features(extract_features,Academic_test_tweets_words)
print(nltk.classify.accuracy(Academic_classifier, Academic_test_set), 'tweet corpus used in academic paper Sentiment Analysis on the Social Networks Using Stream Algorithms Authors: Nathan Aston, Timothy Munson, Jacob Liddle, Garrett Hartshaw, Dane Livingston, Wei Hu *compare to their accuracy of 87.5%')
与我训练和测试准确性的代码相反。我对每件事都使用相同的定义,所以我知道问题不在于定义。唯一的区别是腌制分类器。。。发生什么事了?在
^{pr2}$
目前没有回答
相关问题 更多 >
编程相关推荐