持久化模型如何提高准确率?

-4 投票
1 回答
34 浏览
提问于 2025-04-14 17:26
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score

whitewine_data = pd.read_csv('winequality-white.csv', 
delimiter=';')

variables = ['alcohol_cat', 'alcohol', 'sulphates', 'density', 
'total sulfur dioxide', 'citric acid', 'volatile acidity', 
'chlorides']

X = whitewine_data[variables]
y = whitewine_data['quality']
X_train, X_test, y_train, y_test = train_test_split(X, y, 
test_size=0.2)

model = DecisionTreeClassifier()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred, average='weighted')

predictions = model.predict([[0.27, 0.36, 0.045, 170, 1.001, 
0.45, 8.9, 0]])
print(f'Predicted Output: {predictions}')
print(f'Accuracy: {accuracy * 100}%')
print(f'F1 Score: {f1 * 100}% ')

这个初始模型的准确率是57%

==============================================================

whitewine_data = pd.read_csv('winequality-white.csv', 
delimiter=';')

# Variables to be dropped from the data set - NOT THE INPUT 
VARIABLES
variables = ['fixed acidity', 'residual sugar', 'free sulfur 
dioxide', 'pH', 'quality', 'isSweet']

X = whitewine_data.drop(variables, axis=1)
y = whitewine_data['quality']

X_train, X_test, y_train, y_test = train_test_split(X, y, 
test_size=0.2)

model = DecisionTreeClassifier()
model.fit(X_train, y_train)

joblib.dump(model, 'WhiteWine_Quality_Predictor.joblib')

正在创建保存的模型

==============================================================

whitewine_data = pd.read_csv('winequality-white.csv', 
delimiter=';') 

variables = ['volatile acidity', 'citric acid', 'chlorides', 
'total sulfur dioxide', 'density', 'sulphates', 'alcohol', 
'alcohol_cat']

X_test = whitewine_data[variables]
y_test = whitewine_data['quality']  

model = joblib.load('WhiteWine_Quality_Predictor.joblib')

y_pred = model.predict(X_test)

f1 = f1_score(y_test, y_pred, average='weighted')
accuracy = accuracy_score(y_test, y_pred)
predictions = model.predict([[0.27, 0.36, 0.045, 170, 1.001, 
0.45, 10.9, 3]])

print(f'F1 Score: {f1 * 100}%')
print(f'Model Accuracy: {accuracy * 100}%')
print(f'Predicted Output: {predictions}')

调用保存的模型后,准确率达到了92%

问题:为什么调用一个保存的模型会让我看到准确率的提升呢?

1 个回答

0

这在刚开始接触机器学习算法时是个很常见的错误。

在你的第二个脚本中,你是用winequality-white.csv这个数据集来训练算法,然后把它保存下来,这样做是没问题的。

问题在于,在你的第三个脚本中,你用的是和训练时完全一样的数据集。你实际上是在预测那些你用来训练的数据,所以算法肯定会以100%的准确率预测出来,这一点很明显。

保存算法的方法是对的,但接下来你需要用一个不同的数据集来进行预测,而不是用你训练时用的那个数据集。

撰写回答