训练多个分类器并比较度量

2024-03-29 14:36:20 发布

您现在位置:Python中文网/ 问答频道 /正文

我想在一次运行中计算不同的分类器,并将结果传输到数据帧

# Lets create some test data
import pandas as pd
import numpy as np
import string 
import random
integers = pd.DataFrame(np.random.randint(0,100,size=(50, 1)), columns=list('I'))
strings = pd.DataFrame([random.choice('ab') for _ in range(50)], columns=list('S'))
df2 = pd.concat([strings,integers], axis=1)
df2.head()
    S   I
0   a   5
1   a   31
2   b   84
3   a   79
4   b   92


# Train - Test
from sklearn.model_selection import train_test_split

X = df2[["I"]].values
y = df2["S"]
X_train, X_test, y_train, y_test = train_test_split(X, y)

#Load libraries
from sklearn import metrics
from sklearn.model_selection import cross_val_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.linear_model import LogisticRegression


#Classifiers 
classifiers = [
KNeighborsClassifier(30),
DecisionTreeClassifier(),
RandomForestClassifier(),
AdaBoostClassifier(),
LogisticRegression()]


n_range = list(range(1, 10))
RandomForestClf = []
data_frame = []

for n in n_range:
#    name = clf.__class__.__name__
model = RandomForestClassifier(n_estimators=n)
scores = cross_val_score(model, X, y, cv=5, scoring="accuracy")
RandomForestClf.append(scores.mean())
data_frame = pd.DataFrame({"Random Forest": RandomForestClf})

我无法让各种分类器通过for循环

我如何设置for循环,使每个分类器都能运行,然后将预测传输到panda数据帧

我当前的for循环只有在代码中提到模型时才起作用

我是Python sry的新手

我感谢你的帮助


Tags: fromtestimportdataframefordatamodel分类器
1条回答
网友
1楼 · 发布于 2024-03-29 14:36:20

您可以在for循环之外定义dataframe,然后只需查找分类器名称并检查对象的type即可为其指定:

from sklearn import metrics
from sklearn.model_selection import cross_val_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.linear_model import LogisticRegression


#Classifiers 
classifiers = [KNeighborsClassifier(30),
                DecisionTreeClassifier(),
                RandomForestClassifier(),
                AdaBoostClassifier(),
                LogisticRegression()]

from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)

k = 5
preds = pd.DataFrame(index=[*range(k)])

for cls in classifiers:
    scores = cross_val_score(cls, X, y, cv=k, scoring="accuracy")
    preds[type(cls).__name__] = scores

在这种情况下,您将得到:

print(preds)
   KNeighborsClassifier  DecisionTreeClassifier  RandomForestClassifier  \
0              0.900000                0.966667                0.966667   
1              0.966667                0.966667                0.966667   
2              0.933333                0.900000                0.933333   
3              0.900000                0.966667                0.966667   
4              1.000000                1.000000                1.000000   

   AdaBoostClassifier  LogisticRegression  
0            0.966667            0.966667  
1            0.933333            1.000000  
2            0.900000            0.933333  
3            0.933333            0.966667  
4            1.000000            1.000000   

这里有一个related answer从分类器列表中绘制多个混淆矩阵,以防您可能会发现这也很有用

相关问题 更多 >