使用Python GridSearchCV比较插补器方法？

param_grid = [ {'method': ['mean','median', 'most frequent']}, ] forest_reg = RandomForestRegressor() grid_search = GridSearchCV(forest_reg, param_grid, cv = 5, scoring = 'neg_mean_squared_error') grid_search.fit(titanic_features[method], titanic_values[method])

1条回答

网友

1楼 · 发布于 2024-04-28 11:29:41

SklearnPipeline正是为此而设计的。您必须在回归器之前创建一个具有插补器组件的管道。然后可以使用网格搜索参数grid和__传递组件特定的参数

示例代码（内联记录）

# Sample/synthetic data shape 1000 X 2
X = np.random.randn(1000,2)
y = 1.5*X[:,0]+3.2*X[:, 1]+2.4

# Randomly make 200 data points in each axis as nan's
X[np.random.randint(0,1000, 200), 0] = np.nan
X[np.random.randint(0,1000, 200), 1] = np.nan

# Simple pipeline which has an imputer followed by regressor
pipe = Pipeline(steps=[('impute', SimpleImputer(missing_values=np.nan)),
                       ('regressor', RandomForestRegressor())])

# 3 different imputers and 2 different regressors 
# a total of 6 different parameter combination will be searched
param_grid = {
        'impute__strategy': ["mean", "median", "most_frequent"],
        'regressor__max_depth': [2,3]
        }

# Run girdsearch
search = GridSearchCV(pipe, param_grid)
search.fit(X, y)

print("Best parameter (CV score=%0.3f):" % search.best_score_)
print(search.best_params_)

样本输出：

Best parameter (CV score=0.730):
{'impute__strategy': 'median', 'regressor__max_depth': 3}

因此，通过GridSearchCV，我们能够发现样本数据的最佳插补策略是median，如果max_dept的组合为3

您可以继续使用其他组件扩展管道

示例代码（内联记录）

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用Python GridSearchCV比较插补器方法？

示例代码（内联记录）

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >