使用scikit-learn的集成方法

1 投票

3 回答

748 浏览

提问于 2025-04-17 19:28

有没有办法把不同的分类器合并成一个，在sklearn里？我发现了sklearn.ensamble这个包。里面有很多模型，比如AdaBoost和随机森林（Random Forest），但它们底层都是用决策树的。我想用不同的方法，比如支持向量机（SVM）和逻辑回归（Logistic Regression）。在sklearn里可以做到吗？

scikit-learn 集成学习随机森林逻辑回归支持向量机

3 个回答

为了完成这个任务，我一直在使用DESLib，这是一个已经被整合进sklearn的库，但不知为什么它还是比较冷门。

这个库真的很有用，里面有很多组合规则。

https://deslib.readthedocs.io/en/latest/

https://www.kaggle.com/competitions/rsna-2022-cervical-spine-fracture-detection/rules

回答于 2025-04-17 由 Python大师

分享举报

是的，你可以在同一个数据集上训练不同的模型，让每个模型都做出自己的预测。

# Import functions to compute accuracy and split data
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

# Import models, including VotingClassifier meta-model
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier as KNN
from sklearn.ensemble import VotingClassifier

# Set seed for reproducibility
SEED = 1

现在，创建这些模型的实例。

# Instantiate lr
lr = LogisticRegression(random_state = SEED)

# Instantiate knn
knn = KNN(n_neighbors = 27)

# Instantiate dt
dt = DecisionTreeClassifier(min_samples_leaf = 0.13, random_state = SEED)

然后把它们定义成一个分类器的列表，并把这些不同的分类器组合成一个元模型。

classifiers = [('Logistic Regression', lr), 
               ('K Nearest Neighbours', knn), 
               ('Classification Tree', dt)]

接下来，使用for循环遍历这个预先定义好的分类器列表。

for clf_name, clf in classifiers:    

    # Fit clf to the training set
    clf.fit(X_train, y_train)    

    # Predict y_pred
    y_pred = clf.predict(X_test)

    # Calculate accuracy
    accuracy = accuracy_score(y_pred, y_test) 

    # Evaluate clf's accuracy on the test set
    print('{:s} : {:.3f}'.format(clf_name, accuracy))

最后，我们将评估一个投票分类器的表现，它会根据列表中模型的输出进行投票，并根据多数票来分配标签。

# Voting Classifier
# Instantiate a VotingClassifier vc
vc = VotingClassifier(estimators = classifiers)     

# Fit vc to the training set
vc.fit(X_train, y_train)   

# Evaluate the test set predictions
y_pred = vc.predict(X_test)

# Calculate accuracy score
accuracy = accuracy_score(y_pred, y_test)
print('Voting Classifier: {:.3f}'.format(accuracy))

回答于 2025-04-17 由 Python大师

分享举报

你是不是想做个简单的投票呢？我知道的情况是，这个功能还没有实现。不过，就像我之前说的，你可以把预测的概率分数进行平均。或者你也可以用LabelBinarizer把预测结果转换成二进制形式，然后再进行平均。这样就能实现投票的方式了。

即使你对概率不太感兴趣，平均这些预测的概率可能会比简单的投票更稳健。不过，这个还得试过才知道。

回答于 2025-04-17 由 Python大师

分享举报

使用scikit-learn的集成方法

3 个回答

撰写回答