持久化模型如何提高准确率?
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score
whitewine_data = pd.read_csv('winequality-white.csv',
delimiter=';')
variables = ['alcohol_cat', 'alcohol', 'sulphates', 'density',
'total sulfur dioxide', 'citric acid', 'volatile acidity',
'chlorides']
X = whitewine_data[variables]
y = whitewine_data['quality']
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2)
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred, average='weighted')
predictions = model.predict([[0.27, 0.36, 0.045, 170, 1.001,
0.45, 8.9, 0]])
print(f'Predicted Output: {predictions}')
print(f'Accuracy: {accuracy * 100}%')
print(f'F1 Score: {f1 * 100}% ')
这个初始模型的准确率是57%
==============================================================
whitewine_data = pd.read_csv('winequality-white.csv',
delimiter=';')
# Variables to be dropped from the data set - NOT THE INPUT
VARIABLES
variables = ['fixed acidity', 'residual sugar', 'free sulfur
dioxide', 'pH', 'quality', 'isSweet']
X = whitewine_data.drop(variables, axis=1)
y = whitewine_data['quality']
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2)
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
joblib.dump(model, 'WhiteWine_Quality_Predictor.joblib')
正在创建保存的模型
==============================================================
whitewine_data = pd.read_csv('winequality-white.csv',
delimiter=';')
variables = ['volatile acidity', 'citric acid', 'chlorides',
'total sulfur dioxide', 'density', 'sulphates', 'alcohol',
'alcohol_cat']
X_test = whitewine_data[variables]
y_test = whitewine_data['quality']
model = joblib.load('WhiteWine_Quality_Predictor.joblib')
y_pred = model.predict(X_test)
f1 = f1_score(y_test, y_pred, average='weighted')
accuracy = accuracy_score(y_test, y_pred)
predictions = model.predict([[0.27, 0.36, 0.045, 170, 1.001,
0.45, 10.9, 3]])
print(f'F1 Score: {f1 * 100}%')
print(f'Model Accuracy: {accuracy * 100}%')
print(f'Predicted Output: {predictions}')
调用保存的模型后,准确率达到了92%
问题:为什么调用一个保存的模型会让我看到准确率的提升呢?
1 个回答
0
这在刚开始接触机器学习算法时是个很常见的错误。
在你的第二个脚本中,你是用winequality-white.csv这个数据集来训练算法,然后把它保存下来,这样做是没问题的。
问题在于,在你的第三个脚本中,你用的是和训练时完全一样的数据集。你实际上是在预测那些你用来训练的数据,所以算法肯定会以100%的准确率预测出来,这一点很明显。
保存算法的方法是对的,但接下来你需要用一个不同的数据集来进行预测,而不是用你训练时用的那个数据集。