SkLearn决策树：过度拟合还是Bug？

#compute the rms error def compute_error(x, y, model): yfit = model.predict(x.toarray()) return np.mean(y != yfit) def drawLearningCurve(model,xTrain, yTrain, xTest, yTest): sizes = np.linspace(2, 25000, 50).astype(int) train_error = np.zeros(sizes.shape) crossval_error = np.zeros(sizes.shape) for i,size in enumerate(sizes): model = model.fit(xTrain[:size,:].toarray(),yTrain[:size]) #compute the validation error crossval_error[i] = compute_error(xTest,yTest,model) #compute the training error train_error[i] = compute_error(xTrain[:size,:],yTrain[:size],model) from sklearn import tree clf = tree.DecisionTreeClassifier() drawLearningCurve(clf, xtr, ytr, xte, yte)

1条回答

网友

1楼 · 发布于 2024-06-07 06:19:31

这些赞扬给出了一些非常有用的指示。我只想添加一个您可能需要调整的参数，名为max_depth。在

更让我担心的是你的compute_error函数很奇怪。你得到一个0的错误说明你的分类器在训练集上没有错误。但是，如果它确实犯了错误，那么错误函数不会告诉你。在

import numpy as np
np.mean([0,0,0,0] != [0,0,0,0]) # perfect match, error is 0
0.0

np.mean([0,0,0,0] != [1, 1, 1, 1]) # 100% wrong answers
1.0

np.mean([0,0,0,0] != [1, 1, 1, 0]) # 75% wrong answers
1.0

np.mean([0,0,0,0] != [1, 1, 0, 0]) # 50% wrong answers
1.0

np.mean([0,0,0,0] != [1, 1, 2, 2]) # 50% wrong answers
1.0

您需要的是np.sum(y != yfit)，或者更好的是sklearn附带的一个错误函数，例如accuracy_score。在

相关问题更多 >

编程相关推荐

热门问题

热门文章