scikit-learn的GridSearchCV如何计算best_score_?

6 投票

1 回答

7257 浏览

数据工程师

提问于 2025-04-18 08:57

我一直在想，GridSearchCV中的best_score_参数是怎么计算出来的（换句话说，它到底是什么意思）。

文档上说：

最佳估计器在未使用的数据上的得分。

所以，我试着把它翻译成我能理解的东西，计算了每个k折交叉验证中实际的“y”和预测的“y”的r2_score，结果却不一样（我用的这段代码）：

test_pred = np.zeros(y.shape) * np.nan 
for train_ind, test_ind in kfold:
    clf.best_estimator_.fit(X[train_ind, :], y[train_ind])
    test_pred[test_ind] = clf.best_estimator_.predict(X[test_ind])
r2_test = r2_score(y, test_pred)

我到处找更有意义的解释关于best_score_，但没有找到。有没有人愿意解释一下？

谢谢

机器学习模型评估超参数调优交叉验证 gridsearchcv 估计器 r2_score best_score

1 个回答

这是最佳模型的平均交叉验证得分。我们先来生成一些数据，并调整交叉验证的数据划分方式。

>>> y = linspace(-5, 5, 200)
>>> X = (y + np.random.randn(200)).reshape(-1, 1)
>>> threefold = list(KFold(len(y)))

现在运行 cross_val_score 和 GridSearchCV，都使用这些调整好的数据划分。

>>> cross_val_score(LinearRegression(), X, y, cv=threefold)
array([-0.86060164,  0.2035956 , -0.81309259])
>>> gs = GridSearchCV(LinearRegression(), {}, cv=threefold, verbose=3).fit(X, y) 
Fitting 3 folds for each of 1 candidates, totalling 3 fits
[CV]  ................................................................
[CV] ...................................... , score=-0.860602 -   0.0s
[Parallel(n_jobs=1)]: Done   1 jobs       | elapsed:    0.0s
[CV]  ................................................................
[CV] ....................................... , score=0.203596 -   0.0s
[CV]  ................................................................
[CV] ...................................... , score=-0.813093 -   0.0s
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:    0.0s finished

注意在 GridSearchCV 的输出中看到的 score=-0.860602、score=0.203596 和 score=-0.813093；这些正是 cross_val_score 返回的值。

这里的“平均值”其实是对各个数据划分的宏观平均。GridSearchCV 中的 iid 参数可以用来获取样本的微观平均。

回答于 2025-04-18 由 Python大师

分享举报

scikit-learn的GridSearchCV如何计算best_score_?

1 个回答

撰写回答