如何在scikit learn中将参数仅传递给管道对象的一部分？

X = np.array([[2.0, 2.0, 1.0, 0.0, 1.0, 3.0, 3.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 5.0, 3.0, 2.0, '0'], [15.0, 2.0, 5.0, 5.0, 0.466666666667, 4.0, 3.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 7.0, 14.0, 2.0, '0'], [3.0, 4.0, 3.0, 1.0, 1.33333333333, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 9.0, 8.0, 2.0, '0'], [3.0, 2.0, 3.0, 0.0, 0.666666666667, 2.0, 2.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 5.0, 3.0, 1.0, '0']], dtype=object) y = np.array([ 0., 0., 1., 0.]) m = sklearn.ensemble.RandomForestClassifier( random_state=0, oob_score=True, n_estimators=100, min_samples_leaf=5, max_depth=10) m.fit(X, y, sample_weight=np.array([3,4,2,3]))

m = sklearn.pipeline.Pipeline([ ('feature_selection', sklearn.feature_selection.SelectKBest( score_func=sklearn.feature_selection.f_regression, k=25)), ('model', sklearn.ensemble.RandomForestClassifier( random_state=0, oob_score=True, n_estimators=500, min_samples_leaf=5, max_depth=10))]) m.fit(X, y, sample_weight=np.array([3,4,2,3]))

ValueError Traceback (most recent call last) <ipython-input-212-c4299f5b3008> in <module>() 25 max_depth=10))]) 26 ---> 27 m.fit(X, y, sample_weights=np.array([3,4,2,3])) /usr/local/lib/python2.7/dist-packages/sklearn/pipeline.pyc in fit(self, X, y, **fit_params) 128 data, then fit the transformed data using the final estimator. 129 """ --> 130 Xt, fit_params = self._pre_transform(X, y, **fit_params) 131 self.steps[-1][-1].fit(Xt, y, **fit_params) 132 return self /usr/local/lib/python2.7/dist-packages/sklearn/pipeline.pyc in _pre_transform(self, X, y, **fit_params) 113 fit_params_steps = dict((step, {}) for step, _ in self.steps) 114 for pname, pval in six.iteritems(fit_params): --> 115 step, param = pname.split('__', 1) 116 fit_params_steps[step][param] = pval 117 Xt = X ValueError: need more than 1 value to unpack

3条回答

网友

1楼 · 编辑于 2024-06-05 18:14:34

您还可以使用方法^{}并预先设置步骤的名称。

m = sklearn.pipeline.Pipeline([
    ('feature_selection', sklearn.feature_selection.SelectKBest(
        score_func=sklearn.feature_selection.f_regression,
        k=25)),
    ('model', sklearn.ensemble.RandomForestClassifier(
        random_state=0, 
        oob_score=True, 
        n_estimators=500,
        min_samples_leaf=5, 
        max_depth=10))])

m.set_params(model__sample_weight=np.array([3,4,2,3]))

网友

2楼 · 编辑于 2024-06-05 18:14:34

希望我能在上面的@rovyko帖子上留下评论，而不是一个单独的答案，但是我还没有足够的stackoverflow声誉来留下评论，所以就在这里。

不能使用：

Pipeline.set_params(model__sample_weight=np.array([3,4,2,3])

设置RandomForestClassifier.fit()方法的参数。Pipeline.set_params()如代码（here）所示，仅用于管道中各个步骤的初始化参数。RandomForestClassifier没有名为sample_weight的初始化参数（请参阅其__init__()方法here）。sample_weight实际上是RandomForestClassifier的fit()方法的一个输入参数，因此只能通过正确标记的答案be@ali m中提供的方法来设置，也就是说

m.fit(X, y, model__sample_weight=np.array([3,4,2,3]))。

网友

3楼 · 编辑于 2024-06-05 18:14:34

From the documentation:

The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. For this, it enables setting parameters of the various steps using their names and the parameter name separated by a ‘__’, as in the example below.

因此，您只需在要传递给您的'model'步骤的任何fit参数kwargs前面插入model__：

m.fit(X, y, model__sample_weight=np.array([3,4,2,3]))

相关问题更多 >

编程相关推荐

热门问题

热门文章