分类度量无法处理二进制和连续目标的混合

2024-04-19 18:08:51 发布

您现在位置:Python中文网/ 问答频道 /正文

我的输入数据文件的格式如下:

黄金,黄金,黄金,黄金

T,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

N,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

N,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

N,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

我试图根据剩余列的值预测第一列(gold),下面是我使用的代码:

import pandas as pd
import numpy as np
dataset = pd.read_csv( 'data1extended.txt', sep= ',') 
#convert T into 1 and N into 0
dataset['gold'] = dataset['gold'].astype('category').cat.codes

print(dataset.head())
row_count, column_count = dataset.shape
X = dataset.iloc[:, 1:column_count].values
y = dataset.iloc[:, 0].values

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

from sklearn.ensemble import RandomForestRegressor

regressor = RandomForestRegressor(n_estimators=20, random_state=0)
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))
print(accuracy_score(y_test, y_pred))

我代码的最后3行导致错误,如何修复? enter image description here

此行导致错误: 打印(混淆矩阵(y_测试,y_预测)) 我打印了y_test和y_pred,以下是我获得的: y_测试为:[0 0…0 0 0]
y_pred is:[0.0007123 0.00402548 0.00402548…0.00402548 0.02651928 0.00816086]


Tags: 代码fromtestimportascounttrainsklearn
1条回答
网友
1楼 · 发布于 2024-04-19 18:08:51

您使用的是RandomForestRegressor,它输出连续值输出,即实数,而混淆矩阵期望类别值输出,即离散数输出0、1、2等等

由于您试图预测类,即1或0,您可以做两件事:

1.)使用RandomForestClassifier代替RandomForestRegressionor,后者将输出0或1,您可以使用它获取度量。(推荐)

2.)如果只需要实值输出,可以设置阈值,即

y_pred = (y_pred < threshold).astype(int)

如果输出实数小于阈值else 1,则将其转换为1,并使用它获取度量

相关问题 更多 >