ValueError:模型的特征数必须与输入匹配。型号n_特征为947,输入n_特征为15

2024-04-20 11:23:06 发布

您现在位置:Python中文网/ 问答频道 /正文

所以,我使用随机林来创建这个数据集:https://archive.ics.uci.edu/ml/datasets/auto+mpg
但当我试图预测某件事时,它抛出了一个错误:

ValueError: Number of features of the model must match the input. Model n_features is 947 and input n_features is 15

这是我的文件:

import joblib  # para salvar o modelo
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler  # Para Normalizar
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import r2_score

data = pd.read_csv('auto-mpg.csv',sep = ',')

data['horsepower'] = data['horsepower'].replace('?','100')

x = data.iloc[:,1:].values
y = data.iloc[:,0].values

lb = LabelEncoder()
x[:,7] = lb.fit_transform(x[:,7])

onehot = OneHotEncoder()
x = onehot.fit_transform(x).toarray()

xtrain,xtest,ytrain,ytest = train_test_split(x,y,test_size = 0.2,random_state = 0)

sc = StandardScaler()
x = sc.fit_transform(x)

rfr = RandomForestRegressor(n_estimators = 200,random_state = 0)
rfr.fit(xtrain,ytrain)

ypred_rfr = rfr.predict(xtest)
print('Acuracia:',round(r2_score(ytest,ypred_rfr)*100,2),'%')

joblib.dump(rfr,'randon-forest.model')

这里是错误:

import joblib  # para salvar o modelo
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder

data = pd.read_csv('teste.csv',sep = ',')
print(data.columns);
logit = joblib.load('randon-forest.model')

onehot = OneHotEncoder()
data = onehot.fit_transform(data).toarray()

sc = StandardScaler()
data = sc.fit_transform(data)

# montar um vetor de dados
dados_vet = pd.DataFrame(data)
print(data)
# classificar esse vetor com o logit_bank
result_predict = logit.predict(dados_vet)

print('Logit Bank')
print(result_predict)


Tags: csvfromimportdatamodeltransformsklearnfit
1条回答
网友
1楼 · 发布于 2024-04-20 11:23:06

TL;博士:
用于预测的数据必须与用于训练模型的数据具有相同数量的特征

更详细的回答:
您在auto-mpg.csv上训练模型,我猜其中有947个特性(列)(1列作为标记值)。但是,teste.csv处的数据(我不知道它是什么)可能只有15个特征,因此您无法使用在不同数量特征上训练的模型预测15个特征数据

一个简单的例子:假设你有一个模型,该模型使用以下公式预测房屋建造年份的房价和房间数量:

price = number_of_rooms * 5 + (year-2000) * 20

您试图用代码做的事情,相当于只提供这个示例模型的房间数量。模特也必须“知道”年份

相关问题 更多 >