Scikitlearn中的分层组ShuffleSplit

2024-05-14 23:21:56 发布

您现在位置:Python中文网/ 问答频道 /正文

我想问一下,是否有可能在scikit学习中进行“分层分组ShuffleSplit”,换句话说,这是GroupShuffleSplitStratifiedShuffleSplit的组合

下面是我正在使用的代码示例:

cv=GroupShuffleSplit(n_splits=n_splits,test_size=test_size,\
    train_size=train_size,random_state=random_state).split(\
    allr_sets_nor[:,:2],allr_labels,groups=allr_groups)
opt=GridSearchCV(SVC(decision_function_shape=dfs,tol=tol),\
    param_grid=param_grid,scoring=scoring,n_jobs=n_jobs,cv=cv,verbose=verbose)
opt.fit(allr_sets_nor[:,:2],allr_labels)

这里我应用了GroupShuffleSplit,但是我仍然想根据allr_labels添加startification


Tags: testsizelabelssetstrainrandomcvgroups
1条回答
网友
1楼 · 发布于 2024-05-14 23:21:56

我通过在组上应用StratifiedShuffleSplit解决了这个问题,然后手动查找训练集和测试集索引,因为它们链接到组索引(在我的例子中,每个组包含从6*index6*index+5的6个连续集)

具体如下:

sss=StratifiedShuffleSplit(n_splits=n_splits,test_size=test_size,
    train_size=train_size,random_state=random_state).split(all_groups,all_labels) 
        # startified splitting for groups only

i=0
train_is = [np.array([],dtype=int)]*n_splits
test_is = [np.array([],dtype=int)]*n_splits
for train_index,test_index in sss :
        # finding the corresponding indices of reflected training and testing sets
    train_is[i]=np.hstack((train_is[i],np.concatenate([train_index*6+i for i in range(6)])))
    test_is[i]=np.hstack((test_is[i],np.concatenate([test_index*6+i for i in range(6)])))
    i=i+1

cv=[(train_is[i],test_is[i]) for i in range(n_splits)]
        # constructing the final cross-validation iterable: list of 'n_splits' tuples;
        # each tuple contains two numpy arrays for training and testing indices respectively

opt=GridSearchCV(SVC(decision_function_shape=dfs,tol=tol),param_grid=param_grid,
                 scoring=scoring,n_jobs=n_jobs,cv=cv,verbose=verbose)
opt.fit(allr_sets_nor[:,:2],allr_labels)

相关问题 更多 >

    热门问题