回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我正试图使用<code>sklearn.inspection.plot_partial_dependence</code>在我使用keras和kerassklearn包装工具成功构建的模型上创建部分依赖图(请参见下面的代码块)。包装模型构建成功后,可以使用拟合方法,拟合后可以使用预测方法,达到预期的效果。所有迹象表明,这是一个有效的估计。然而,当我试图从sklearn.inspection运行plot_partial_dependence时,我得到一些错误文本,暗示它不是一个有效的估计量,尽管我可以证明它是</p>
<p>通过使用sklearn示例波士顿住房数据,我对其进行了编辑,使其更易于再现</p>
<pre><code>from sklearn.datasets import load_boston
from sklearn.inspection import plot_partial_dependence, partial_dependence
from keras.wrappers.scikit_learn import KerasRegressor
import keras
import tensorflow as tf
import pandas as pd
boston = load_boston()
feature_names = boston.feature_names
X = pd.DataFrame(boston.data, columns=boston.feature_names)
y = boston.target
mean = X.describe().transpose()['mean']
std = X.describe().transpose()['std']
X_norm = (X-mean)/std
def build_model_small():
model = keras.Sequential([
keras.layers.Dense(64, activation='relu', input_shape=[len(X.keys())]),
keras.layers.Dense(64, activation='relu'),
keras.layers.Dense(1)
])
optimizer = keras.optimizers.RMSprop(0.0005)
model.compile(loss='mse',
optimizer=optimizer,
metrics=['mae', 'mse', 'mape'])
return model
kr = KerasRegressor(build_fn=build_model_small,verbose=0)
kr.fit(X_norm,y, epochs=100, validation_split = 0.2)
pdp_plot = plot_partial_dependence(kr,X_norm,feature_names)
</code></pre>
<p>就像我说的,如果我运行<code>kr.predict(X.head(20))</code>,我会得到前20行<code>X</code>的<code>y</code>值的20个预测,这是一个有效的估计器所期望的</p>
<p>但我从plot_partial_dependence中得到的错误文本如下:</p>
<pre><code>Traceback (most recent call last):
File "temp_ML_tf_sklearn_postproc.py", line 79, in <module>
pdp_plot = plot_partial_dependence(kr,X,labels[:-1])
File "/home/mymachine/anaconda3/lib/python3.7/site-packages/sklearn/inspection/_partial_dependence.py", line 678, in plot_partial_dependence
for fxs in features)
File "/home/mymachine/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 921, in __call__
if self.dispatch_one_batch(iterator):
File "/home/mymachine/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 759, in dispatch_one_batch
self._dispatch(tasks)
File "/home/mymachine/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 716, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/mymachine/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 182, in apply_async
result = ImmediateResult(func)
File "/home/mymachine/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 549, in __init__
self.results = batch()
File "/home/mymachine/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 225, in __call__
for func, args, kwargs in self.items]
File "/home/mymachine/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 225, in <listcomp>
for func, args, kwargs in self.items]
File "/home/mymachine/anaconda3/lib/python3.7/site-packages/sklearn/inspection/_partial_dependence.py", line 307, in partial_dependence
"'estimator' must be a fitted regressor or classifier."
ValueError: 'estimator' must be a fitted regressor or classifier.
</code></pre>
<p>我查看了plot_partial_dependence的源代码,它有以下内容。
首先,在docstring中,它表示第一个输入<code>estimator</code>必须是</p>
<blockquote>
<pre><code> A fitted estimator object implementing :term:`predict`,
:term:`predict_proba`, or :term:`decision_function`.
Multioutput-multiclass classifiers are not supported.
</code></pre>
</blockquote>
<p>我的估计器确实实现了。预测</p>
<p>其次,errr回溯中调用的行调用检查程序,检查它是回归器还是分类器:</p>
<pre><code>if not (is_classifier(estimator) or is_regressor(estimator)):
raise ValueError(
"'estimator' must be a fitted regressor or classifier."
)
</code></pre>
<p>我查看了is_regressor()的源代码,它是一个单行程序,如下所示:</p>
<pre><code>return getattr(estimator, "_estimator_type", None) == "regressor"
</code></pre>
<p>所以我试着通过做<code>setattr(mp,'_estimator_type','regressor')</code>来破解它,它只是说<code>Attribute Error: can't set attribute</code>,所以这是一个不起作用的廉价解决方法</p>
<p>我甚至尝试了更黑客的修复,并临时注释掉了_partial_dependence.py(我在上面复制的if语句)源中的违规检查,并得到以下错误:</p>
<pre><code>Traceback (most recent call last):
File "temp_ML_tf_sklearn_postproc.py", line 79, in <module>
pdp_plot = plot_partial_dependence(kr,X,labels[:-1])
File "/home/billy/anaconda3/lib/python3.7/site-packages/sklearn/inspection/_partial_dependence.py", line 678, in plot_partial_dependence
for fxs in features)
File "/home/billy/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 921, in __call__
if self.dispatch_one_batch(iterator):
File "/home/billy/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 759, in dispatch_one_batch
self._dispatch(tasks)
File "/home/billy/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 716, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/billy/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 182, in apply_async
result = ImmediateResult(func)
File "/home/billy/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 549, in __init__
self.results = batch()
File "/home/billy/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 225, in __call__
for func, args, kwargs in self.items]
File "/home/billy/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 225, in <listcomp>
for func, args, kwargs in self.items]
File "/home/billy/anaconda3/lib/python3.7/site-packages/sklearn/inspection/_partial_dependence.py", line 317, in partial_dependence
check_is_fitted(est)
File "/home/billy/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 967, in check_is_fitted
raise NotFittedError(msg % {'name': type(estimator).__name__})
sklearn.exceptions.NotFittedError: This KerasRegressor instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.
</code></pre>
<p>这就回到了sklearn函数的问题上,它认为这个模型是合适的,而实际上是合适的。无论如何,在这一点上,我决定不再尝试任何更危险、更骇人的修补程序来修补源代码</p>
<p>我还尝试将<code>kr.fit(X,y,etc...)</code>作为plot\u partial\u依赖的第一个参数直接传入。计算机旋转了几分钟,表明fit实际上正在运行,但当它试图运行部分依赖图时,我得到了相同的错误</p>
<p>还有一个相当令人困惑的线索。我尝试在另一个sklearn函数中完全使用keras/sklearn包装的管道,看看它是否能与任何sklearn实用程序一起工作。这一次,我做到了:</p>
<pre><code>from sklearn.model_selection import cross_validate
cv_scores = cross_validate(kr,X_norm,y, cv=4, return_train_score=True, n_jobs=-1)`
</code></pre>
<p>成功了!所以我不认为我使用<code>keras.wrappers.scikit_learn.KerasRegressor</code>有什么内在的问题</p>
<p>这可能只是一个例子,我正在尝试做的是一个边缘案例,在plot_partial_dependence源代码中没有具体计划,我运气不好,但我想知道是否有其他人看到过这样的问题,并有解决方案或解决方法</p>
<p>顺便说一下,我正在使用sklearn 0.22.1和Python 3.7.3(Anaconda)。要明确的是,我使用了对sklearn构建的模型甚至管道的plot_partial_依赖。这个问题只发生在基于keras的模型上。非常感谢大家的意见</p>
<p>编辑:</p>
<p>这个问题的前一个版本涉及使用StandardScaler()构建管道,然后使用KerasRegressionr包装对象。从那时起,我发现即使只有KerasRegressionor对象也会发生这种情况,也就是说,我将问题与之隔离,而不是管道。因此,正如一位评论者所建议的,我将管道部分排除在问题之外,以使其更简单、更切题</p>