SKLearn 交叉验证错误 -- 类型错误
我正在尝试对我的KNN分类器的结果进行交叉验证。我使用了以下代码,但出现了类型错误。
为了让你更明白,我已经导入了SciKit Learn、Numpy和Pandas这几个库。
from sklearn.cross_validation import cross_val_score, ShuffleSplit
n_samples = len(y)
knn = KNeighborsClassifier(3)
cv = ShuffleSplit(n_samples, n_iter=10, test_size=0.3, random_state=0)
test_scores = cross_val_score(knn, X, y, cv=cv)
test_scores.mean()
返回结果:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-139-d8cc3ee0c29b> in <module>()
7 cv = ShuffleSplit(n_samples, n_iter=10, test_size=0.3, random_state=0)
8
9 test_scores = cross_val_score(knn, X, y, cv=cv)
10 test_scores.mean()
//anaconda/lib/python2.7/site-packages/sklearn/cross_validation.pyc in cross_val_score(estimator, X, y, scoring, cv, n_jobs, verbose, fit_params, score_func, pre_dispatch)
1150 delayed(_cross_val_score)(clone(estimator), X, y, scorer, train, test,
1151 verbose, fit_params)
1152 for train, test in cv)
1153 return np.array(scores)
1154
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __call__(self, iterable)
515 try:
516 for function, args, kwargs in iterable:
517 self.dispatch(function, args, kwargs)
518
519 self.retrieve()
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in dispatch(self, func, args, kwargs)
310 """
311 if self._pool is None:
312 job = ImmediateApply(func, args, kwargs)
313 index = len(self._jobs)
314 if not _verbosity_filter(index, self.verbose):
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __init__(self, func, args, kwargs)
134 # Don't delay the application, to avoid keeping the input
135 # arguments in memory
136 self.results = func(*args, **kwargs)
137
138 def get(self):
//anaconda/lib/python2.7/site-packages/sklearn/cross_validation.pyc in _cross_val_score(estimator, X, y, scorer, train, test, verbose, fit_params)
1056 y_test = None
1057 else:
1058 y_train = y[train]
1059 y_test = y[test]
1060 estimator.fit(X_train, y_train, **fit_params)
TypeError: only integer arrays with one element can be converted to an index
1 个回答
1
这是一个与pandas库有关的错误。Scikit-learn这个库希望接收到的是numpy数组、稀疏矩阵,或者是类似这些的对象。
pandas的DataFrame主要问题在于,当你用[...]来索引时,它是选择列而不是行。要选择行的话,应该使用DataFrame.loc[...]。这个行为对于sklearn来说是意外的。因此,错误可能出现在第1058行,代码在提取训练样本时出现了问题。
要解决这个问题,如果你的y是DataFrame中的一列,试着把这一列转换成数组类型。
y = y.values
否则,你可以考虑使用pandas-sklearn这个库。