所以,我使用随机林来创建这个数据集:https://archive.ics.uci.edu/ml/datasets/auto+mpg
但当我试图预测某件事时,它抛出了一个错误:
ValueError: Number of features of the model must match the input. Model n_features is 947 and input n_features is 15
这是我的文件:
import joblib # para salvar o modelo
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler # Para Normalizar
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import r2_score
data = pd.read_csv('auto-mpg.csv',sep = ',')
data['horsepower'] = data['horsepower'].replace('?','100')
x = data.iloc[:,1:].values
y = data.iloc[:,0].values
lb = LabelEncoder()
x[:,7] = lb.fit_transform(x[:,7])
onehot = OneHotEncoder()
x = onehot.fit_transform(x).toarray()
xtrain,xtest,ytrain,ytest = train_test_split(x,y,test_size = 0.2,random_state = 0)
sc = StandardScaler()
x = sc.fit_transform(x)
rfr = RandomForestRegressor(n_estimators = 200,random_state = 0)
rfr.fit(xtrain,ytrain)
ypred_rfr = rfr.predict(xtest)
print('Acuracia:',round(r2_score(ytest,ypred_rfr)*100,2),'%')
joblib.dump(rfr,'randon-forest.model')
这里是错误:
import joblib # para salvar o modelo
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder
data = pd.read_csv('teste.csv',sep = ',')
print(data.columns);
logit = joblib.load('randon-forest.model')
onehot = OneHotEncoder()
data = onehot.fit_transform(data).toarray()
sc = StandardScaler()
data = sc.fit_transform(data)
# montar um vetor de dados
dados_vet = pd.DataFrame(data)
print(data)
# classificar esse vetor com o logit_bank
result_predict = logit.predict(dados_vet)
print('Logit Bank')
print(result_predict)
TL;博士:
用于预测的数据必须与用于训练模型的数据具有相同数量的特征
更详细的回答:
您在
auto-mpg.csv
上训练模型,我猜其中有947个特性(列)(1列作为标记值)。但是,teste.csv
处的数据(我不知道它是什么)可能只有15个特征,因此您无法使用在不同数量特征上训练的模型预测15个特征数据一个简单的例子:假设你有一个模型,该模型使用以下公式预测房屋建造年份的房价和房间数量:
您试图用代码做的事情,相当于只提供这个示例模型的房间数量。模特也必须“知道”年份
相关问题 更多 >
编程相关推荐