在使用GridsearchCV进行随机森林分类时遇到的问题

-3 投票

1 回答

34 浏览

提问于 2025-04-14 18:01

我正在做一个关于心脏病的分类问题，使用的是随机森林分类器（RandomForestClassifier）。在对随机森林分类器进行超参数调优时，我遇到了一些问题。我使用了 sklearn 的 Pipeline 和 ColumnTransformer 来进行数据预处理。

Error: 720 fits failed out of a total of 2160.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.
UserWarning: One or more of the test scores are non-finite

numerical_pipeline = Pipeline(
steps=[('scaler',StandardScaler())]
)

categorical_pipeline = Pipeline(
steps=[('encoder',OneHotEncoder(handle_unknown='ignore'))]  
)

preprocessor = ColumnTransformer(
[('numerical_pipeline',numerical_pipeline,numerical_features),
 ('categorical_pipeline',categorical_pipeline,categorical_features)]`

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3)`

scaled_X_train = preprocessor.fit_transform(X_train)
scaled_X_test = preprocessor.transform(X_test)`

param_grid={'max_depth':[3,5,10,None],
          'n_estimators':[10,100,200],
          'max_features':[1,3,5,7],
          'min_samples_leaf':[1,2,3],
          'min_samples_split':[1,2,3]
       }

grid = GridSearchCV(RandomForestClassifier(),param_grid=param_grid,cv=5,scoring='accuracy',verbose=True,n_jobs=-1)
grid.fit(scaled_X_train,y_train)

超参数调优分类问题随机森林网格搜索

1 个回答

从错误信息来看，似乎有些超参数的组合可能导致了错误的发生。你的部分模型运行得很好，但有一部分却失败了。把 min_samples_split 的值列表中的 1 去掉，因为这个值必须是2或更大。

如果这样做还不能解决错误，可以在 GridSearchCV 中添加 error_score='raise'，这样当遇到错误时，它会打印出完整的错误信息。

回答于 2025-04-14 由 Python大师

分享举报

在使用GridsearchCV进行随机森林分类时遇到的问题

1 个回答

撰写回答