哪个r平方分数更有帮助？

data.drop('Movie Title', axis=1, inplace=True) features = data.loc[:, data.columns != 'worldwide_gross_usd'] charges = data['worldwide_gross_usd'] X_train, X_test, y_train, y_test = train_test_split(features, charges, random_state=42, test_size = 0.2) regr = LinearRegression().fit(X_train, y_train) y_pred = regr.predict(X_test) print('Trained R-squared score: ', regr.score(X_train, y_train)) print('Tested R-squared score: ', regr.score(X_test, y_test))

X_train, X_test, y_train, y_test = train_test_split(features, charges, random_state=12, test_size = 0.2) regr = LinearRegression().fit(X_train, y_train) y_pred = regr.predict(X_test) print('Trained R-squared score: ', regr.score(X_train, y_train)) print('Tested R-squared score: ', regr.score(X_test, y_test))

1条回答

网友

1楼 · 发布于 2024-06-16 12:11:04

R平方得分是回归模型的快速估计值，但不是一个好的估计值。

It is like:
You have 3 points on a 2D plane (say p1, p2, p3).
In 1st case, you plot regression line using p1 and p2, then test it on p3, and get r1 scores.
Nextly, you plot regression line using p2 and p3, then test it on p1, and get r2 scores.
So, you cannot fully depend on just R-squared score with different random state.

推论：

如果所有数据点都具有同等相关性，那么测试集上的R平方分数越高越好
如果您不确定数据集的相关性，那么您应该检查其他参数/方法，以找到哪个R平方分数更好

其他参数/方法：

您应该为这两种情况绘制剩余图。检查哪一个平均值接近零，方差接近1（对于大多数数据集），哪个更好。如果任何一种情况下的残差图都有某种模式，那么这种情况就不好，可以改进。如果任何情况下的残差图中有残差，则该情况也不好，可以改进

Note: For example, you want to predict house prices, and have data of area of house, location, BHK, number of people previously living there, etc. But house prices depends more on area of house rather number of people previously living there. So both are not equally relevant. This is what I mean by equally relevant.

相关问题更多 >

编程相关推荐

热门问题

热门文章