SGDClassizer未给出最佳结果,如logistic回归

2024-04-25 23:17:09 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在用sklearns的LogisticRegressionSGDclassifier训练一个数据集,其中log是损失函数。 我使用Logloss作为我的评估指标。你知道吗

但是SGDclassifier的对数损失非常高(0.66),然后LogisticRegression(0.48)。你知道吗

我尝试过用alphan_iterlearning_ratemax_iter来调整参数,但运气不好。你知道吗

逻辑回归分类器的代码。你知道吗

alpha = [10 ** x for x in range(-5, 3)] # hyperparam for SGD classifier.

log_error_array=[]
for i in alpha:
    clf = LogisticRegression(C=i, penalty='l1', random_state=42,class_weight = 'balanced')
    clf.fit(tr,approval_tr)
    sig_clf = CalibratedClassifierCV(clf, method="sigmoid")
    sig_clf.fit(tr, approval_tr)
    predict_y = sig_clf.predict_proba(cv)
    log_error_array.append(log_loss(approval_cv, predict_y, eps=1e-15))
    print('For values of alpha = ', i, "The log loss is:",log_loss(approval_cv, predict_y, eps=1e-15))

输出

For values of alpha =  1e-05 The log loss is: 0.6649895381677852
For values of alpha =  0.0001 The log loss is: 0.6649874120729949
For values of alpha =  0.001 The log loss is: 0.6649874120658615
For values of alpha =  0.01 The log loss is: 0.546799752877368
For values of alpha =  0.1 The log loss is: 0.49969119164808273
For values of alpha =  1 The log loss is: 0.4768379193463679
For values of alpha =  10 The log loss is: 0.4838656842062527
For values of alpha =  100 The log loss is: 0.4969062791884036

用于SGDClassizer

params = {'alpha' : [10 ** x for x in range(-4, 3)],
          'learning_rate' : ['constant','optimal','invscaling','adaptive'],
         }

clf = SGDClassifier(loss = 'log',class_weight = 'balanced',random_state = 42,eta0 = 10,penalty = 'l2',n_iter = 1000,max_iter = 2000)
tuned_clf = GridSearchCV(clf, param_grid = params,scoring = 'neg_log_loss',verbose = 1,n_jobs = -1)
tuned_clf.fit(tr,approval_tr)
tuned_clf.best_score_ 

输出

-0.6887660472967351

任何建议我在这里遗漏了什么。你知道吗


Tags: ofthealphalogforispredicttr