<p>随机林是一个<a href="https://scikit-learn.org/stable/modules/ensemble.html" rel="nofollow noreferrer">ensemble method</a>。基本上,它使用不同的数据子集(称为bagging)构建单独的决策树,并对所有树的预测进行平均,以给出概率。“帮助”页实际上是一个很好的起点:</p>
<blockquote>
<p>In averaging methods, the driving principle is to build several
estimators independently and then to average their predictions. On
average, the combined estimator is usually better than any of the
single base estimator because its variance is reduced.</p>
<p>Examples: Bagging methods, Forests of randomized trees, …</p>
</blockquote>
<p>因此,概率总和总是为一。下面是如何访问每个树的单个预测的示例:</p>
<pre><code>from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.33, random_state=42)
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=10)
model.fit(X_train, y_train)
pred = model.predict_proba(X_test)
pred[:5,:]
array([[0. , 1. , 0. ],
[1. , 0. , 0. ],
[0. , 0. , 1. ],
[0. , 0.9, 0.1],
[0. , 0.9, 0.1]])
</code></pre>
<p>这是对第一棵树的预测:</p>
<pre><code>model.estimators_[0].predict(X_test)
Out[42]:
array([1., 0., 2., 2., 1., 0., 1., 2., 2., 1., 2., 0., 0., 0., 0., 2., 2.,
1., 1., 2., 0., 2., 0., 2., 2., 2., 2., 2., 0., 0., 0., 0., 1., 0.,
0., 2., 1., 0., 0., 0., 2., 2., 1., 0., 0., 1., 1., 2., 1., 2.])
</code></pre>
<p>我们记录了所有树木:</p>
<pre><code>result = np.zeros((len(X_test),3))
for i in range(len(model.estimators_)):
p = model.estimators_[i].predict(X_test).astype(int)
result[range(len(X_test)),p] += 1
result[:5,:]
Out[63]:
array([[ 0., 10., 0.],
[10., 0., 0.],
[ 0., 0., 10.],
[ 0., 9., 1.],
[ 0., 9., 1.]])
</code></pre>
<p>将其除以树的数量,即可得出您之前获得的概率:</p>
<pre><code>result/10
Out[65]:
array([[0. , 1. , 0. ],
[1. , 0. , 0. ],
[0. , 0. , 1. ],
[0. , 0.9, 0.1],
[0. , 0.9, 0.1],
</code></pre>