scikit通过多次重复学习GridSearchCV

NUM_TRIALS = 10 scores = [] for i in range(NUM_TRIALS): cv = KFold(n_splits=5, shuffle=True, random_state=i) clf = GridSearchCV(estimator=svr, param_grid=p_grid, cv=cv) scores.append(clf.best_score_) print "Average Score: {0} STD: {1}".format(numpy.mean(scores), numpy.std(scores))

2条回答

网友

1楼 · 编辑于 2024-05-29 03:45:56

您可以为GridSearchCV提供不同的交叉验证生成器。二进制或多类分类问题的默认值是^{}。否则，它使用^{}。但你可以自己供应。在您的情况下，看起来您需要^{}或^{}。

from sklearn.model_selection import GridSearchCV, RepeatedStratifiedKFold

# Define svr here
...

# Specify cross-validation generator, in this case (10 x 5CV)
cv = RepeatedKFold(n_splits=5, n_repeats=10)
clf = GridSearchCV(estimator=svr, param_grid=p_grid, cv=cv)

# Continue as usual
clf.fit(...)

网友

2楼 · 编辑于 2024-05-29 03:45:56

这称为嵌套交叉验证。您可以查看official documentation example以引导您进入正确的方向，也可以查看我的other answer here以获得类似的方法。

您可以根据需要调整步骤：

svr = SVC(kernel="rbf")
c_grid = {"C": [1, 10, 100, ...  ]}

# CV Technique "LabelKFold", "LeaveOneOut", "LeaveOneLabelOut", etc.

# To be used within GridSearch (5 in your case)
inner_cv = KFold(n_splits=5, shuffle=True, random_state=i)

# To be used in outer CV (you asked for 10)
outer_cv = KFold(n_splits=10, shuffle=True, random_state=i)

# Non_nested parameter search and scoring
clf = GridSearchCV(estimator=svr, param_grid=c_grid, cv=inner_cv)
clf.fit(X_iris, y_iris)
non_nested_score = clf.best_score_

# Pass the gridSearch estimator to cross_val_score
# This will be your required 10 x 5 cvs
# 10 for outer cv and 5 for gridSearch's internal CV
clf = GridSearchCV(estimator=svr, param_grid=c_grid, cv=inner_cv)
nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=outer_cv).mean()

编辑-使用cross_val_score()和GridSearchCV()的嵌套交叉验证的描述

clf=GridSearchCV（估计器，param_grid，cv=inner_cv）。
通过clf, X, y, outer_cv到cross_val_score
如source code of cross_val_score所示，这个X将使用outer_cv划分为X_outer_train, X_outer_test。你也是
X_outer_test将被推迟，并且X_outer_train将被传递给clf for fit（）（在我们的例子中是GridSearchCV）。从这里开始，假设X_outer_train被称为X_inner，因为它被传递给内部估计器，假设y_outer_train是y_inner。
X_inner现在将使用GridSearchCV中的inner_cv分成X_inner_train和X_inner_test。y也是
现在gridSearch估计器将使用X_inner_train和y_train_inner进行训练，并使用X_inner_test和y_inner_test进行评分。
对于内部容器（本例中为5），将重复步骤5和6。
所有内部迭代(X_inner_train, X_inner_test)的平均得分最好的超参数被传递给clf.best_estimator_，并适合所有数据，即X_outer_train。
这个clf（gridsearch.best_estimator_）将使用X_outer_test和y_outer_test进行评分。
对于外部cvu iter（此处为10），将重复步骤3至9，并从cross_val_score返回一系列分数
然后使用mean（）返回nested_score。

相关问题更多 >

编程相关推荐

热门问题

热门文章