我尝试编写代码来使用Naive Bayes classifier
。但效果不好。
我的数据是关于社交网络中某个id的个人资料视图数量的数据,并确定它是否真的是某个人的id
from __future__ import division
from collections import defaultdict
from math import log
def train(samples):
classes, freq = defaultdict(lambda:0), defaultdict(lambda:0)
for feats, label in samples:
classes[label] += 1 # count classes frequencies
for feat in feats:
freq[label, feat] += 1 # count features frequencies
for label, feat in freq: # normalize features frequencies
freq[label, feat] /= classes[label]
for c in classes: # normalize classes frequencies
classes[c] /= len(samples)
return classes, freq # return P(C) and P(O|C)
def classify(classifier, feats):
classes, prob = classifier
return min(classes.keys(), # calculate argmin(-log(C|O))
key = lambda cl: -log(classes[cl]) + \
sum(-log(prob.get((cl,feat), 10**(-7))) for feat in feats))
def get_features(sample): return (str(sample)[2:5],)
samples = (line.decode('utf-8').split() for line in open('sample.txt'))
features = [(get_features(feat), label) for feat, label in samples]
classifier = train(features)
print 'Is it real ID: ', classify(classifier, get_features(0.132))
samole.txt
是一个
0.019546345 1
0.009769094 0
0.000888099 0
0.004440497 0
0.009769094 0
0.000888099 0
当我尝试print 'Is it real ID: ', classify(classifier, get_features(0.132))
时,它总是返回0
。
我哪里出错了?你知道吗
目前没有回答
相关问题 更多 >
编程相关推荐