Sklearn StratifiedKFold:ValueError:支持的目标类型为:(“binary”,“multiclass”)。取而代之的是“多标签指示器”

2024-04-18 22:35:27 发布

您现在位置:Python中文网/ 问答频道 /正文

使用Sklearn分层kfold split,当我尝试使用多个类进行拆分时,收到错误消息(见下文)。当我尝试使用二进制文件进行拆分时,它没有问题。

num_classes = len(np.unique(y_train))
y_train_categorical = keras.utils.to_categorical(y_train, num_classes)
kf=StratifiedKFold(n_splits=5, shuffle=True, random_state=999)

# splitting data into different folds
for i, (train_index, val_index) in enumerate(kf.split(x_train, y_train_categorical)):
    x_train_kf, x_val_kf = x_train[train_index], x_train[val_index]
    y_train_kf, y_val_kf = y_train[train_index], y_train[val_index]

ValueError: Supported target types are: ('binary', 'multiclass'). Got 'multilabel-indicator' instead.

Tags: 消息index分层错误二进制trainvalsklearn
3条回答

像这样调用split()

for i, (train_index, val_index) in enumerate(kf.split(x_train, y_train_categorical.argmax(1))):
    x_train_kf, x_val_kf = x_train[train_index], x_train[val_index]
    y_train_kf, y_val_kf = y_train[train_index], y_train[val_index]

keras.utils.to_categorical生成一个热编码类向量,即错误消息中提到的multilabel-indicatorStratifiedKFold不是设计用来处理这种输入的;从split方法docs

split(X, y, groups=None)

[...]

y : array-like, shape (n_samples,)

The target variable for supervised learning problems. Stratification is done based on the y labels.

也就是说,你的y必须是你的类标签的一维数组。

本质上,您所要做的只是反转操作的顺序:先拆分(使用初始的y_train),然后再转换to_categorical

我遇到了同样的问题,发现您可以使用这个util函数检查目标的类型:

from sklearn.utils.multiclass import type_of_target
type_of_target(y)

'multilabel-indicator'

从其docstring:

  • 'binary': y contains <= 2 discrete values and is 1d or a column vector.
  • 'multiclass': y contains more than two discrete values, is not a sequence of sequences, and is 1d or a column vector.
  • 'multiclass-multioutput': y is a 2d array that contains more than two discrete values, is not a sequence of sequences, and both dimensions are of size > 1.
  • 'multilabel-indicator': y is a label indicator matrix, an array of two dimensions with at least two columns, and at most 2 unique values.

使用LabelEncoder可以将类转换为一维数字数组(假设目标标签位于一维类别/对象数组中):

from sklearn.preprocessing import LabelEncoder

label_encoder = LabelEncoder()
y = label_encoder.fit_transform(target_labels)

相关问题 更多 >