scikitlearn的BaggingClassifier和自定义基估计器的问题:操作数不能一起广播?

2024-04-25 12:00:13 发布

您现在位置:Python中文网/ 问答频道 /正文

我正试图用scikitlearn的BaggingClassifier使用一个自定义分类器,但我得到了一个错误,无法确定错误的来源。我的分类器对象通过check_estimator(),我对fit()函数没有任何问题:

model = ensemble.BaggingClassifier(customEstimator, max_samples=1/n_estimators, n_estimators=n_estimators)
model.fit(trainfeat, trainlabels)
model.predict(testfeat)

这将产生下面的错误跟踪。基估计器本身通过sigmoid阈值进行二进制预测。我知道这些值必须对应于测试数据,但我不明白这三个运算符应该是什么?而且,这看起来像是错误来自BaggingClassifier,但问题一定来自我,不是吗

我试图避免粘贴整个估算器的代码,但它继承了BaseEstimator,我只编写/重载函数:fitpredictpredict_proba。我在这方面有什么遗漏吗

我尝试过重塑功能/标签,但没有效果,甚至没有改变错误。我还试图让我的估计器继承ClassifierMixin,但这最终给了我很多新问题

  File "Main_File.py", line 76, in <module>
    model.predict(testfeat)

  File "G:\Software\Anaconda\lib\site-packages\sklearn\multiclass.py", line 310, in predict
    indices.extend(np.where(_predict_binary(e, X) > thresh)[0])

  File "G:\Software\Anaconda\lib\site-packages\sklearn\multiclass.py", line 98, in _predict_binary
    score = estimator.predict_proba(X)[:, 1]

  File "G:\Software\Anaconda\lib\site-packages\sklearn\ensemble\bagging.py", line 698, in predict_proba
    for i in range(n_jobs))

  File "G:\Software\Anaconda\lib\site-packages\joblib\parallel.py", line 1003, in __call__
    if self.dispatch_one_batch(iterator):

  File "G:\Software\Anaconda\lib\site-packages\joblib\parallel.py", line 834, in dispatch_one_batch
    self._dispatch(tasks)

  File "G:\Software\Anaconda\lib\site-packages\joblib\parallel.py", line 753, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)

  File "G:\Software\Anaconda\lib\site-packages\joblib\_parallel_backends.py", line 201, in apply_async
    result = ImmediateResult(func)

  File "G:\Software\Anaconda\lib\site-packages\joblib\_parallel_backends.py", line 582, in __init__
    self.results = batch()

  File "G:\Software\Anaconda\lib\site-packages\joblib\parallel.py", line 256, in __call__
    for func, args, kwargs in self.items]

  File "G:\Software\Anaconda\lib\site-packages\joblib\parallel.py", line 256, in <listcomp>
    for func, args, kwargs in self.items]

  File "G:\Software\Anaconda\lib\site-packages\sklearn\ensemble\bagging.py", line 129, in _parallel_predict_proba
    proba += proba_estimator

ValueError: operands could not be broadcast together with shapes (100000,2) (100000,) (100000,2)

Tags: inpyselfparallellibpackages错误line
1条回答
网友
1楼 · 发布于 2024-04-25 12:00:13

我猜问题来自于你的customEstimatorpredict_proba输出

看起来您当前的实现返回的输出带有维度(n_samples, 1),这是不兼容的。对于二进制分类问题,请确保predict_proba输出的维度是(n_samples, 2)

相关问题 更多 >