R的平方为0.0,在lm.score()中表示什么?

2024-04-25 13:06:48 发布

您现在位置:Python中文网/ 问答频道 /正文

在这个page上,R^2被定义为:

The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((y_true - y_pred) ** 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.

我无法理解这句话:

A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.

除了常数模型给出y_true.mean()的情况外,常数模型如何给出R^2为0.0

谢谢


Tags: andofthetruemodelthatisbe
1条回答
网友
1楼 · 发布于 2024-04-25 13:06:48

因此,如果你拟合一个常数模型(即所有预测值均为1),它是一个仅截距的模型,其中截距是平均值,因为这解释了最大的方差

因此,按照你提供的公式,R正好为零。在预测值或模型在零处没有预测值的情况下,它将给出接近零的R^2(甚至是负值)

我们可以在下面手动进行此计算

首先是数据集:

import pandas as pd
from sklearn.datasets import load_iris
from sklearn.metrics import r2_score
from sklearn import linear_model
iris = load_iris()
df = pd.DataFrame(data= iris['data'],
                     columns= iris['feature_names'] )

我们拟合a模型并计算残差:

mdl_full = linear_model.LinearRegression()
mdl_full.fit(df[['petal width (cm)']],df['petal length (cm)'])
pred = mdl.predict(df[['petal width (cm)']])
resid_full = np.linalg.norm(df['petal length (cm)'] - pred) ** 2

仅使用截距拟合模型:

mdl_constant = linear_model.LinearRegression()
mdl_constant.fit(X = np.repeat(0,150).reshape(-1, 1),y=df['petal length (cm)'])
pred = mdl_constant.predict(df[['petal width (cm)']])
resid_constant = np.linalg.norm(df['petal length (cm)'] - pred) ** 2

我们可以手动计算r^2:

(1 - resid_full / resid_constant)
0.9265562307373204

这正是你从中得到的。分数:

mdl_full.score(df[['petal width (cm)']],df['petal length (cm)'])
0.9265562307373204

所以你可以看到,如果完整的模型和你的常数模型完全一样,它给出的r平方为0。您可以用X=1、X=2等重新调整常量模型,但它给出的结果基本相同

相关问题 更多 >