Sklearn,网格搜索:如何在执行期间打印进度?
我正在使用 sklearn
里的 GridSearch
来优化分类器的参数。因为数据量很大,所以整个优化过程需要花费很长时间,超过一天。我想在执行过程中查看已经尝试过的参数组合的表现,这样可以吗?
4 个回答
15
你可以看看这个 GridSearchCVProgressBar。
我刚刚发现这个工具,现在正在使用它。非常喜欢这个工具:
In [1]: GridSearchCVProgressBar
Out[1]: pactools.grid_search.GridSearchCVProgressBar
In [2]:
In [2]: ??GridSearchCVProgressBar
Init signature: GridSearchCVProgressBar(estimator, param_grid, scoring=None, fit_params=None, n_jobs=1, iid=True, refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs', error_score='raise', return_train_score='warn')
Source:
class GridSearchCVProgressBar(model_selection.GridSearchCV):
"""Monkey patch Parallel to have a progress bar during grid search"""
def _get_param_iterator(self):
"""Return ParameterGrid instance for the given param_grid"""
iterator = super(GridSearchCVProgressBar, self)._get_param_iterator()
iterator = list(iterator)
n_candidates = len(iterator)
cv = model_selection._split.check_cv(self.cv, None)
n_splits = getattr(cv, 'n_splits', 3)
max_value = n_candidates * n_splits
class ParallelProgressBar(Parallel):
def __call__(self, iterable):
bar = ProgressBar(max_value=max_value, title='GridSearchCV')
iterable = bar(iterable)
return super(ParallelProgressBar, self).__call__(iterable)
# Monkey patch
model_selection._search.Parallel = ParallelProgressBar
return iterator
File: ~/anaconda/envs/python3/lib/python3.6/site-packages/pactools/grid_search.py
Type: ABCMeta
In [3]: ?GridSearchCVProgressBar
Init signature: GridSearchCVProgressBar(estimator, param_grid, scoring=None, fit_params=None, n_jobs=1, iid=True, refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs', error_score='raise', return_train_score='warn')
Docstring: Monkey patch Parallel to have a progress bar during grid search
File: ~/anaconda/envs/python3/lib/python3.6/site-packages/pactools/grid_search.py
Type: ABCMeta
39
我想补充一下DavidS的回答
给你一个简单的例子,使用verbose=1
时,效果是这样的:
Fitting 10 folds for each of 1 candidates, totalling 10 fits
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 10 out of 10 | elapsed: 1.2min finished
而使用verbose=10
时,效果是这样的:
Fitting 10 folds for each of 1 candidates, totalling 10 fits
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.637, total= 7.1s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 7.0s remaining: 0.0s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.630, total= 6.5s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1
[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 13.5s remaining: 0.0s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.637, total= 6.5s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1
[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 20.0s remaining: 0.0s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.637, total= 6.7s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1
[Parallel(n_jobs=1)]: Done 4 out of 4 | elapsed: 26.7s remaining: 0.0s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.632, total= 7.9s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 34.7s remaining: 0.0s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.622, total= 6.9s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1
[Parallel(n_jobs=1)]: Done 6 out of 6 | elapsed: 41.6s remaining: 0.0s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.627, total= 7.1s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1
[Parallel(n_jobs=1)]: Done 7 out of 7 | elapsed: 48.7s remaining: 0.0s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.628, total= 7.2s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1
[Parallel(n_jobs=1)]: Done 8 out of 8 | elapsed: 55.9s remaining: 0.0s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.640, total= 6.6s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1
[Parallel(n_jobs=1)]: Done 9 out of 9 | elapsed: 1.0min remaining: 0.0s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.629, total= 6.6s
[Parallel(n_jobs=1)]: Done 10 out of 10 | elapsed: 1.2min finished
在我的情况下,verbose=1
就足够了。
171
在GridSearchCV
中,把verbose
这个参数设置为一个正数(数字越大,得到的信息就越详细)。比如:
GridSearchCV(clf, param_grid, cv=cv, scoring='accuracy', verbose=10)