我正在用sklearns的LogisticRegression
和SGDclassifier
训练一个数据集,其中log是损失函数。
我使用Logloss
作为我的评估指标。你知道吗
但是SGDclassifier
的对数损失非常高(0.66),然后LogisticRegression
(0.48)。你知道吗
我尝试过用alpha
、n_iter
、learning_rate
和max_iter
来调整参数,但运气不好。你知道吗
逻辑回归分类器的代码。你知道吗
alpha = [10 ** x for x in range(-5, 3)] # hyperparam for SGD classifier.
log_error_array=[]
for i in alpha:
clf = LogisticRegression(C=i, penalty='l1', random_state=42,class_weight = 'balanced')
clf.fit(tr,approval_tr)
sig_clf = CalibratedClassifierCV(clf, method="sigmoid")
sig_clf.fit(tr, approval_tr)
predict_y = sig_clf.predict_proba(cv)
log_error_array.append(log_loss(approval_cv, predict_y, eps=1e-15))
print('For values of alpha = ', i, "The log loss is:",log_loss(approval_cv, predict_y, eps=1e-15))
输出
For values of alpha = 1e-05 The log loss is: 0.6649895381677852
For values of alpha = 0.0001 The log loss is: 0.6649874120729949
For values of alpha = 0.001 The log loss is: 0.6649874120658615
For values of alpha = 0.01 The log loss is: 0.546799752877368
For values of alpha = 0.1 The log loss is: 0.49969119164808273
For values of alpha = 1 The log loss is: 0.4768379193463679
For values of alpha = 10 The log loss is: 0.4838656842062527
For values of alpha = 100 The log loss is: 0.4969062791884036
用于SGDClassizer
params = {'alpha' : [10 ** x for x in range(-4, 3)],
'learning_rate' : ['constant','optimal','invscaling','adaptive'],
}
clf = SGDClassifier(loss = 'log',class_weight = 'balanced',random_state = 42,eta0 = 10,penalty = 'l2',n_iter = 1000,max_iter = 2000)
tuned_clf = GridSearchCV(clf, param_grid = params,scoring = 'neg_log_loss',verbose = 1,n_jobs = -1)
tuned_clf.fit(tr,approval_tr)
tuned_clf.best_score_
输出
-0.6887660472967351
任何建议我在这里遗漏了什么。你知道吗
目前没有回答
相关问题 更多 >
编程相关推荐