scikit-learn SGDClassifier 的 warm start 被忽略

3 投票
1 回答
2327 浏览
提问于 2025-04-18 21:39

我正在尝试使用scikit-learn版本0.15.1中的SGDClassifier。看起来没有办法设置收敛标准,除了迭代次数。所以我想手动检查每次迭代的错误,然后在改进变得足够小之前,继续进行额外的迭代。

不幸的是,warm_start标志和coef_init/intercept_init似乎并没有真正实现热启动优化——它们似乎都是从头开始。

我该怎么办呢?没有真正的收敛标准或热启动,这个分类器就没法用了。

注意下面的情况,每次重启时偏差增加得很厉害,而损失也在增加,但随着进一步的迭代又会下降。在250次迭代后,偏差是-3.44,平均损失是1.46。

sgd = SGDClassifier(loss='log', alpha=alpha, verbose=1, shuffle=True, 
                    warm_start=True)
print('INITIAL FIT')
sgd.fit(X, y, sample_weight=sample_weight)
sgd.n_iter = 1
print('\nONE MORE ITERATION')
sgd.fit(X, y, sample_weight=sample_weight)
sgd.n_iter = 3
print('\nTHREE MORE ITERATIONS')
sgd.fit(X, y, sample_weight=sample_weight)


INITIAL FIT
-- Epoch 1
Norm: 254.11, NNZs: 92299, Bias: -5.239955, T: 122956, Avg. loss: 28.103236
Total training time: 0.04 seconds.
-- Epoch 2
Norm: 138.81, NNZs: 92598, Bias: -5.180938, T: 245912, Avg. loss: 16.420537
Total training time: 0.08 seconds.
-- Epoch 3
Norm: 100.61, NNZs: 92598, Bias: -5.082776, T: 368868, Avg. loss: 12.240537
Total training time: 0.12 seconds.
-- Epoch 4
Norm: 74.18, NNZs: 92598, Bias: -5.076395, T: 491824, Avg. loss: 9.859404
Total training time: 0.17 seconds.
-- Epoch 5
Norm: 55.57, NNZs: 92598, Bias: -5.072369, T: 614780, Avg. loss: 8.280854
Total training time: 0.21 seconds.

ONE MORE ITERATION
-- Epoch 1
Norm: 243.07, NNZs: 92598, Bias: -11.271497, T: 122956, Avg. loss: 26.148746
Total training time: 0.04 seconds.

THREE MORE ITERATIONS
-- Epoch 1
Norm: 258.70, NNZs: 92598, Bias: -16.058395, T: 122956, Avg. loss: 29.666688
Total training time: 0.04 seconds.
-- Epoch 2
Norm: 142.24, NNZs: 92598, Bias: -15.809559, T: 245912, Avg. loss: 17.435114
Total training time: 0.08 seconds.
-- Epoch 3
Norm: 102.71, NNZs: 92598, Bias: -15.715853, T: 368868, Avg. loss: 12.731181
Total training time: 0.12 seconds.

1 个回答

6

warm_start=True 这个设置会把已经训练好的参数当作新的起点来继续训练,但它会重新开始调整学习率的计划。

如果你想手动检查模型是否收敛,我建议你使用 partial_fit,而不是 fit,就像 @AdrienNK 提到的那样:

sgd = SGDClassifier(loss='log', alpha=alpha, verbose=1, shuffle=True, 
                warm_start=True, n_iter=1)
sgd.partial_fit(X, y)
# after 1st iteration
sgd.partial_fit(X, y)
# after 2nd iteration
...

撰写回答