cross_val_score和估计器得分之间的区别?
我在使用Scikit-learn这个库。我尝试用普通的交叉验证方法和快速的 cross_validation.cross_val_score
来进行交叉验证。但是我发现得到的结果不一样。为什么会这样呢?
import numpy as np
from sklearn import cross_validation, datasets
digits = datasets.load_digits()
X = digits.data
y = digits.target
svc = svm.SVC(kernel='linear')
kfold = cross_validation.KFold(len(X))
scores = [svc.fit(X[train], y[train]).score(X[test], y[test]) for train, test in kfold]
#scores output: [0.93489148580968284, 0.95659432387312182, 0.93989983305509184]
cross_validation.cross_val_score(svc, X, y)
#output: array([ 0.98 , 0.982, 0.983])
1 个回答
1
根据cross_val_score
的文档说明,当你提供一个包含类别标签(整数)的目标向量时,它会进行分层交叉验证。
>>> kfold = cross_validation.StratifiedKFold(y)
>>> [svc.fit(X[train], y[train]).score(X[test], y[test])
... for train, test in kfold]
[0.93521594684385378, 0.95826377295492482, 0.93791946308724827]
>>> cross_validation.cross_val_score(svc, X, y)
array([ 0.93521595, 0.95826377, 0.93791946])