如何确定在scikit learn中执行超参数调整的最佳基线模型？

# Trying out different classifiers and selecting the best ## Creat list of classifiers we're going to loop through classifiers = [ KNeighborsClassifier(), SVC(), DecisionTreeClassifier(), RandomForestClassifier(), AdaBoostClassifier(), GradientBoostingClassifier() ] classifier_names = [ 'kNN', 'SVC', 'DecisionTree', 'RandomForest', 'AdaBoost', 'GradientBoosting' ] model_scores = [] ## Looping through the classifiers for classifier, name in zip(classifiers, classifier_names): pipe = Pipeline(steps=[ ('preprocessor', preprocessor), ('selector', SelectKBest(k=len(X.columns))), ('classifier', classifier)]) score = cross_val_score(pipe, X, y, cv=5, scoring='accuracy').mean() model_scores.append(score) print("Model score for {}: {}".format(name, score))

Model score for kNN: 0.7472524440239673 Model score for SVC: 0.7896621728161464 Model score for DecisionTree: 0.7302148734267939 Model score for RandomForest: 0.779058799919727 Model score for AdaBoost: 0.7949635904933918 Model score for GradientBoosting: 0.7930712637252372

2条回答

网友

1楼 · 编辑于 2024-05-15 11:50:12

不，在超参数调优之前，没有办法知道100%确定的，哪种分类器在任何给定的问题上表现最好。然而，在实践中，Kaggle竞赛在表格数据分类问题（与基于文本或图像的分类问题相反）上表明，在几乎所有情况下，基于梯度增强的决策树模型（如XGBoost或LightGBM）效果最好。有鉴于此，在超参数调优下GradientBoosting的性能可能会更好，因为它是based off LightGBM

在上面的代码中，您所做的只是简单地使用超参数的所有默认值，对于那些对超参数调优更敏感的算法，它不一定像您所建议的那样指示最终（微调）性能

网友

2楼 · 编辑于 2024-05-15 11:50:12

是的，有单变量、双变量和多变量分析等方法来查看数据，然后决定可以从哪个模型开始作为基线

您还可以使用sklearn方法选择正确的估计器

https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html

相关问题更多 >

编程相关推荐

热门问题

热门文章