如何修复sklearn中的折叠?

2024-04-20 14:30:10 发布

您现在位置:Python中文网/ 问答频道 /正文

我在几个预测任务中应用CV,并且希望对我的每个参数集一直使用相同的折叠——如果可能的话,还可以在不同的python脚本中使用,因为性能实际上取决于折叠。 我正在与SkkFold合作:

kf = KFold(n_splits=folds, shuffle=False, random_state=1986)

把我的折叠起来

for idx_split, (train_index, test_index) in enumerate(kf.split(X, Y)):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = Y[train_index], Y[test_index]

像这样绕过去

for idx_alpha, alpha in enumerate([0, 0.2, 0.4, 0.6, 0.8, 1]):
    # [...]
    for idx_split, (train_index, test_index) in enumerate(kf.split(X, Y)):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = Y[train_index], Y[test_index]**

虽然我选择了一个随机的状态并设置了一个numpy种子,但是褶皱并不总是相等的。我能做些什么来实现这一点,并可能通过几个python脚本共享我的折叠?你知道吗


Tags: intestalpha脚本for参数indextrain
1条回答
网友
1楼 · 发布于 2024-04-20 14:30:10

你好像在重新发明GridSearchCV;-)

尝试以下方法:

from sklearn.model_selection import GridSearchCV

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

param_grid = dict(model__alpha=[0, 0.2, 0.4, 0.6, 0.8, 1])

model = Lasso()  # put here algorithm, that you want to use

folds = 3
# alternatively you can prepare folds yourself
#folds = KFold(n_splits=folds, shuffle=False, random_state=1986)
grid_search = GridSearchCV(model, param_grid=param_grid, cv=folds, n_jobs=-1, verbose=2)
grid_search.fit(X_train, y_train)

y_pred = grid_search.best_estimator_.predict(X_test)

相关问题 更多 >