scikit.learn cross-valu中的错误

--------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-37-4a10affe67c7> in <module>() 1 # evaluate the model using 10-fold cross-validation ----> 2 scores = cross_val_score(LogisticRegression(), X, y, scoring='accuracy', cv=10) 3 print scores 4 print scores.mean() C:\Python27\lib\site-packages\sklearn\cross_validation.pyc in cross_val_score(estimator, X, y, scoring, cv, n_jobs, verbose, fit_params, score_func, pre_dispatch) 1140 allow_nans=True, allow_nd=True) 1141 -> 1142 cv = _check_cv(cv, X, y, classifier=is_classifier(estimator)) 1143 scorer = check_scoring(estimator, score_func=score_func, scoring=scoring) 1144 # We clone the estimator to make sure that all the folds are C:\Python27\lib\site-packages\sklearn\cross_validation.pyc in _check_cv(cv, X, y, classifier, warn_mask) 1366 if classifier: 1367 if type_of_target(y) in ['binary', 'multiclass']: -> 1368 cv = StratifiedKFold(y, cv, indices=needs_indices) 1369 else: 1370 cv = KFold(_num_samples(y), cv, indices=needs_indices) C:\Python27\lib\site-packages\sklearn\cross_validation.pyc in __init__(self, y, n_folds, indices, shuffle, random_state) 428 for test_fold_idx, per_label_splits in enumerate(zip(*per_label_cvs)): 429 for label, (_, test_split) in zip(unique_labels, per_label_splits): --> 430 label_test_folds = test_folds[y == label] 431 # the test split can be too big because we used 432 # KFold(max(c, self.n_folds), self.n_folds) instead of IndexError: too many indices for array

2条回答

网友

1楼 · 编辑于 2024-05-23 14:01:09

我也犯了同样的错误，当我发现这个问题的时候，我正在寻找答案。

我使用相同的sklearn.cross_validation.cross_valu score（不同的算法除外）和相同的机器windows 7，64位。

我从上面尝试了你的解决方案，它“起作用”，但它给了我以下警告：

C:\Users\E245713\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\cross_validation.py:1531: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). estimator.fit(X_train, y_train, **fit_params)

在阅读了警告之后，我发现问题与“y”（我的标签栏）的形状有关。要从警告中尝试的关键字是“ravel（）”。所以，我尝试了以下方法：

y_arr = pd.DataFrame.as_matrix(label)
print(y_arr)
print(y_arr.shape())

给了我

  [[1]
   [0]
   [1]
   .., 
   [0]
   [0]
   [1]]

  (87939, 1)

当我添加“ravel（）”时：

y_arr = pd.DataFrame.as_matrix(label).ravel()
print(y_arr)
print(y_arr.shape())

它给了我：

[1 0 1 ..., 0 0 1]

(87939,)

“y_arr”的维数必须是（87939，）而不是（87939,1）。在那之后，我最初的cross_valu嫒u分数在没有添加Kfold代码的情况下工作。

希望这有帮助。

网友

2楼 · 编辑于 2024-05-23 14:01:09

我知道答案迟了。
但这个答案可能会帮助其他人克服同样的错误。我对Python3.6也有同样的问题从3.6改为3.5后，我就可以使用该功能了。
下面是我运行的示例：

accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10, n_jobs = -1)

首先用3.5版本创建conda env。

conda create -n py35 python=3.5  
source activate py35

希望这有助于我们向前迈进

相关问题更多 >

编程相关推荐

热门问题

热门文章