我重新创建了一个新的集成方法,在我的三个分类器之间手动投票。(丹尼尔的礼貌,他帮助我完成了这个功能:Improving the prediction score by use of confidence level of classifiers on instances)。在
这个手动投票的目的是接受每个实例中最有信心的分类器的答案。以下是代码及其准确度得分:
# parameters for random forest
rfclf_params = {
'n_estimators': 500,
'bootstrap': True,
'class_weight':None,
'criterion':'gini',
'max_depth':None,
'max_features':'auto',
'warm_start': True,
'random_state': 41
# ... fill in the rest you want here
}
# Fill in svm params here
svm_params = {
'C': 100,
'probability':True,
'random_state':42
}
# KNeighbors params go here
kneighbors_params= {
'n_neighbors': 5,
'weights':'distance'
}
y_test_classes = (y_test_sl, y_test_lim, y_test_shale, y_test_sandlim, y_test_ss, y_test_dol, y_test_sand)
classifiers = [RandomForestClassifier, SVC, KNeighborsClassifier]
params = [rfclf_params, svm_params, kneighbors_params]
y_trains_classes= (y_train_sl, y_train_lim, y_train_shale, y_train_sandlim,
y_train_ss, y_train_dol, y_train_sand)
y_classes_names = ("shaly limestone", "limestone", "shale", "sandy lime",
"shaly sandstone", "dolomite", "sandstone")
#Just get predictions
for y_trains, y_test, y_strings in zip(y_trains_classes, y_test_classes, y_classes_names):
y_preds_test = ensemble_test(classifiers, params, X_train, y_trains, X_test_prepared)
print("\n","Accuracy score for", y_strings, "=", accuracy_score(y_test, y_preds_test))
print("f1_score for", y_strings, "=", f1_score(y_test, y_preds_test,
average = 'weighted', labels=np.unique(y_preds_test)))
print("roc auc score for", y_strings, "=", roc_auc_score(y_test, y_preds_test,
average = 'weighted'))
Accuracy score for shaly limestone = 0.949514563107
f1_score for shaly limestone = 0.949653574035
roc auc score for shaly limestone = 0.933362369338
Accuracy score for limestone = 0.957281553398
f1_score for limestone = 0.957272532095
roc auc score for limestone = 0.957311555515
Accuracy score for shale = 0.95145631068
f1_score for shale = 0.948556595316
roc auc score for shale = 0.845505617978
Accuracy score for sandy lime = 0.998058252427
f1_score for sandy lime = 0.998008114117
roc auc score for sandy lime = 0.95
Accuracy score for shaly sandstone = 0.996116504854
f1_score for shaly sandstone = 0.998054474708
roc auc score for shaly sandstone = 0.5
Accuracy score for dolomite = 1.0
f1_score for dolomite = 1.0
roc auc score for dolomite = 1.0
Accuracy score for sandstone = 0.996116504854
f1_score for sandstone = 0.996226826208
roc auc score for sandstone = 0.997995991984
当我想绘制ROC曲线时,我知道我需要从这个函数中得到predict_probas
,因此,参考前面链接的建议,我改为使用函数返回概率:
现在,由于我想为测试集中的所有类绘制ROC曲线,所以我做了以下工作,得到的ROC曲线看起来与我预期的非常不同,因为我的ROC-AUC分数除了“泥质砂岩”类外都很好。在
for y_trains, y_test, y_strings in zip(y_trains_classes, y_test_classes, y_classes_names):
y_scores_ensemble_all = ensemble_proba(classifiers, params, X_train, y_trains, X_test_prepared)
fpr_ensemble_all, tpr_ensemble_all, thresholds_ensemble_all = roc_curve(y_test_all,
y_scores_ensemble_all)
plt.figure(figsize=(8, 6))
plot_roc_curve(fpr_ensemble_all, tpr_ensemble_all, "Ensemble manual voting")
plt.legend(loc="lower right", fontsize=16)
plt.title('ROC curve of Ensemble manual voting of %s'%(y_strings))
plt.axis([-0.01, 1.01, -0.01, 1.01])
plt.show()
为什么曲线看起来像这样,当他们的F1分数和ROC-AUC分数对几乎所有的班级都很好,但他们在ROC曲线上表现不佳?当我从函数中返回概率时,是不是做错了什么,或者曲线应该是这样的,是因为某些原因?在
目前没有回答
相关问题 更多 >
编程相关推荐