我正在建立一个mlp分类模型在科学工具包学习。我用gridSearchCV和roc_nuauc对模型进行评分。火车和考试的平均成绩在0.76分左右,还不错。cv_results_
的输出是:
Train set AUC: 0.553465272412
Grid best score (AUC): 0.757236688092
Grid best parameter (max. AUC): {'hidden_layer_sizes': 10}
{ 'mean_fit_time': array([63.54, 136.37, 136.32, 119.23, 121.38, 124.03]),
'mean_score_time': array([ 0.04, 0.04, 0.04, 0.05, 0.05, 0.06]),
'mean_test_score': array([ 0.76, 0.74, 0.75, 0.76, 0.76, 0.76]),
'mean_train_score': array([ 0.76, 0.76, 0.76, 0.77, 0.77, 0.77]),
'param_hidden_layer_sizes': masked_array(data = [5 (5, 5) (5, 10) 10 (10, 5) (10, 10)],
mask = [False False False False False False],
fill_value = ?)
,
'params': [ {'hidden_layer_sizes': 5},
{'hidden_layer_sizes': (5, 5)},
{'hidden_layer_sizes': (5, 10)},
{'hidden_layer_sizes': 10},
{'hidden_layer_sizes': (10, 5)},
{'hidden_layer_sizes': (10, 10)}],
'rank_test_score': array([ 2, 6, 5, 1, 4, 3]),
'split0_test_score': array([ 0.76, 0.75, 0.75, 0.76, 0.76, 0.76]),
'split0_train_score': array([ 0.76, 0.75, 0.75, 0.76, 0.76, 0.76]),
'split1_test_score': array([ 0.77, 0.76, 0.76, 0.77, 0.76, 0.76]),
'split1_train_score': array([ 0.76, 0.75, 0.75, 0.76, 0.76, 0.76]),
'split2_test_score': array([ 0.74, 0.72, 0.73, 0.74, 0.74, 0.75]),
'split2_train_score': array([ 0.77, 0.77, 0.77, 0.77, 0.77, 0.77]),
'std_fit_time': array([47.59, 1.29, 1.86, 3.43, 2.49, 9.22]),
'std_score_time': array([ 0.01, 0.01, 0.01, 0.00, 0.00, 0.01]),
'std_test_score': array([ 0.01, 0.01, 0.01, 0.01, 0.01, 0.01]),
'std_train_score': array([ 0.01, 0.01, 0.01, 0.01, 0.01, 0.00])}
如你所见,我使用的是3的k值。有趣的是,人工计算的列车组的roc_auc_分数报告为0.55,而平均列车分数报告为~0.76。生成此输出的代码是:
^{pr2}$由于这种差异,我决定“模拟”GridSearchCV
例程并得到以下结果:
Shape X_train: (107119, 15)
Shape y_train: (107119,)
Shape X_val: (52761, 15)
Shape y_val: (52761,)
layers roc-auc
Seq l1 l2 train test iters runtime
1 5 0 0.5522 0.5488 85 20.54
2 5 5 0.5542 0.5513 80 27.10
3 5 10 0.5544 0.5521 83 28.56
4 10 0 0.5532 0.5516 61 15.24
5 10 5 0.5540 0.5518 54 19.86
6 10 10 0.5507 0.5474 56 21.09
评分均在0.55分左右,与上述代码中人工计算的结果一致。令我吃惊的是结果没有变化。似乎我犯了什么错误,但我找不到,请看代码:
def simple_mlp (X, y, verbose=True, random_state = 42):
def do_mlp (X_t, X_v, y_t, y_v, n, l1, l2=None):
if l2 is None:
layers = (l1)
l2 = 0
else:
layers = (l1, l2)
t = time.time ()
mlp = MLPClassifier(solver='adam', learning_rate_init=1e-4,
hidden_layer_sizes=layers,
max_iter=200,
verbose=False,
random_state=random_state)
mlp.fit(X_t, y_t)
y_hat_train = mlp.predict(X_t)
y_hat_val = mlp.predict(X_v)
if verbose:
av = 'samples'
acc_trn = roc_auc_score(y_train, y_hat_train, average=av)
acc_tst = roc_auc_score(y_val, y_hat_val, average=av)
print ("{:5d}{:4d}{:4d}{:7.4f}{:7.4f}{:9d}{:8.2f}"
.format(n, l1, l2, acc_trn, acc_tst, mlp.n_iter_, time.time() - t))
return mlp, n + 1
X_train, X_val, y_train, y_val = train_test_split (X, y, test_size=0.33, random_state=random_state)
if verbose:
print('Shape X_train:', X_train.shape)
print('Shape y_train:', y_train.shape)
print('Shape X_val:', X_val.shape)
print('Shape y_val:', y_val.shape)
# MLP requires scaling of all predictors
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_val = scaler.transform(X_val)
n = 1
layers1 = [5, 10]
layers2 = [5, 10]
if verbose:
print (" layers roc-auc")
print (" Seq l1 l2 train validation iters runtime")
for l1 in layers1:
mlp, n = do_mlp (X_train, X_val, y_train, y_val, n, l1)
for l2 in layers2:
mlp, n = do_mlp (X_train, X_val, y_train, y_val, n, l1, l2)
return mlp
我在这两个案例中使用了完全相同的数据(159880个观察值和15个预测值)。我将cv=3
(默认值)用于GridSearchCV
,并在我手工编制的代码中对验证集使用相同的比例。
在寻找可能的答案时,我找到了描述相同问题的this post on SO。没有人回答。也许有人知道到底发生了什么?在
谢谢你的时间。在
编辑
我按照@Mohammed Kashif的建议检查了GridSearchCV和KFold的代码,确实发现了一个明确的注释,即KFold没有对数据进行洗牌。因此,我在定标器之前向model\u mlp添加了以下代码:
np.random.seed (random_state)
index = np.random.permutation (len(X_train))
X_train = X_train.iloc[index]
并将其转换为简单的\u mlp,以代替列车的测试分割:
np.random.seed (random_state)
index = np.random.permutation (len(X))
X = X.iloc[index]
y = y.iloc[index]
train_size = int (2 * len(X) / 3.0) # sample of 2 third
X_train = X[:train_size]
X_val = X[train_size:]
y_train = y[:train_size]
y_val = y[train_size:]
结果如下:
Train set AUC: 0.5
Grid best score (AUC): 0.501410198106
Grid best parameter (max. AUC): {'hidden_layer_sizes': (5, 10)}
{ 'mean_fit_time': array([28.62, 46.00, 54.44, 46.74, 55.25, 53.33]),
'mean_score_time': array([ 0.04, 0.05, 0.05, 0.05, 0.05, 0.06]),
'mean_test_score': array([ 0.50, 0.50, 0.50, 0.50, 0.50, 0.50]),
'mean_train_score': array([ 0.50, 0.51, 0.51, 0.51, 0.50, 0.51]),
'param_hidden_layer_sizes': masked_array(data = [5 (5, 5) (5, 10) 10 (10, 5) (10, 10)],
mask = [False False False False False False],
fill_value = ?)
,
'params': [ {'hidden_layer_sizes': 5},
{'hidden_layer_sizes': (5, 5)},
{'hidden_layer_sizes': (5, 10)},
{'hidden_layer_sizes': 10},
{'hidden_layer_sizes': (10, 5)},
{'hidden_layer_sizes': (10, 10)}],
'rank_test_score': array([ 6, 2, 1, 4, 5, 3]),
'split0_test_score': array([ 0.50, 0.50, 0.51, 0.50, 0.50, 0.50]),
'split0_train_score': array([ 0.50, 0.51, 0.50, 0.51, 0.50, 0.51]),
'split1_test_score': array([ 0.50, 0.50, 0.50, 0.50, 0.49, 0.50]),
'split1_train_score': array([ 0.50, 0.50, 0.51, 0.50, 0.51, 0.51]),
'split2_test_score': array([ 0.49, 0.50, 0.49, 0.50, 0.50, 0.50]),
'split2_train_score': array([ 0.51, 0.51, 0.51, 0.51, 0.50, 0.51]),
'std_fit_time': array([19.74, 19.33, 0.55, 0.64, 2.36, 0.65]),
'std_score_time': array([ 0.01, 0.01, 0.00, 0.01, 0.00, 0.01]),
'std_test_score': array([ 0.01, 0.00, 0.01, 0.00, 0.00, 0.00]),
'std_train_score': array([ 0.00, 0.00, 0.00, 0.00, 0.00, 0.00])}
这似乎证实了穆罕默德的话。我必须说,一开始我是相当怀疑的,因为我无法想象随机化对如此大的数据集有如此大的影响,而这个数据集看起来并不像是有序的。在
不过,我有些怀疑。在最初的设置中,GridSearchCV始终高出约0.20,而现在它始终过低约0.05。这是一个改进,因为两种方法的偏差都减少了4倍。是否对最后的发现有解释,或者两种方法之间的偏差约为0.05仅仅是噪音的事实?我决定把这个标记为正确答案,但我希望有人能给我一点启示。在
分数上的差异主要是由于
GridSearchCV
分割数据集的方法不同,以及模拟它的函数。这样想吧。假设数据集中有9个数据点。现在在GridSearchCV中,假设分布如下:但是,模拟GridSearchCV的函数可能以不同的方式拆分数据,例如:
^{pr2}$现在,正如您所看到的,在数据集上的这种不同的分割,因此在它上面训练的分类器的行为可能完全不同。(它的行为甚至可能是一样的,这一切都取决于数据点和其他各种因素,比如它们之间的相关性,它们是否有助于检查数据点之间的差异等)。在
因此,为了完美地模拟GridSearchCV,您需要以相同的方式执行拆分。在
检查GridSearchCV Source,您将发现在第592行,为了执行CV,它们从指定的at this link调用另一个函数。它实际上调用Kfold CV或startified CV。在
因此,根据您的实验,我建议使用固定的随机种子和上面提到的函数(无论是Kfold CV或startified CV)显式地对数据集执行CV。然后在仿真函数中使用相同的CV对象,以获得更具可比性的分析。然后你可能会得到更多的相关值。在
相关问题 更多 >
编程相关推荐