RandomForest高OOB分数与低KFold验证分数

2024-04-19 00:24:31 发布

您现在位置：Python中文网/ 问答频道 /正文

2162

网友

男 | 程序猿一只，喜欢编程写python代码。

我一直在用泰坦尼克号数据集训练随机森林模型。许多文章指出，我们不需要对RF分类器进行交叉验证，而很少有人说可以使用交叉验证。我尝试了这两种方法，但我不知道如何利用这些分数，如果没有交叉验证，我怀疑我的模型是过拟合的

该模型的oob分数为96.85，平均交叉验证分数为83.27[如果我设置scoring='f1'，则该模型的得分为74.01]

这是我的密码

from sklearn.ensemble import RandomForestClassifier

clf = RandomForestClassifier(n_estimators=10, random_state=44, oob_score=True)

clf.fit(titanic[features], titanic['Survived'])

clf.score(titanic[features], titanic['Survived'])

score : 0.9685746352413019

predictors = features
clf = RandomForestClassifier(random_state=10, n_estimators=10)
clf.fit(titanic[features],titanic["Survived"])

kf = KFold(n_splits=10)

scores = cross_val_score(clf, titanic[predictors], titanic["Survived"], cv=kf)

print(scores.mean())
score : 83.27

有人能解释一下这个分数吗

谢谢

Tags：模型 random 交叉分数 fit score features state

1条回答

网友

1楼 · 发布于 2024-04-19 00:24:31

clf.score不返回OOB分数，而是返回训练数据上的分数

OOB分数通过clf.oob_score_方法访问

RandomForest高OOB分数与低KFold验证分数

相关问题更多 >

编程相关推荐

热门问题

热门文章

RandomForest高OOB分数与低KFold验证分数

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >