我试图在Numpy中计算一个least squares问题(即简单回归的普通最小二乘法(OLS))以找到相应的R²值。但是,在某些情况下,,Numpy返回一个残差的空列表。以下面的over-determined为例(即等式多于未知数)来说明这个问题:
(注:存在常数因子(即截距)(即所有1的初始列向量),因此将使用无中心总平方和(TSS)
import numpy as np
A = np.array([[6, 6, 3], [40, 40, 20]]).T
y = np.array([0.5, 0.2, 0.6])
model_parameters, residuals, rank, singular_values = np.linalg.lstsq(A, y, rcond=None)
# No Intercept, therefore use Uncentered Total Sum of Squares (TSS)
uncentered_tss = np.sum((y)**2)
numpy_r2 = 1.0 - residuals / uncentered_tss
print("Numpy Model Parameter(s): " + str(model_parameters))
print("Numpy Sum of Squared Residuals (SSR): " + str(residuals))
print("Numpy R²: " + str(numpy_r2))
以下内容生成以下输出:
^{pr2}$... residuals will be empty when the equations are under-determined or well-determined but return values when they are over-determined.
然而,这个问题显然是过度确定的(3个方程对2个未知)。我甚至可以通过计算statsmodels's OLS function给出的regression results来证明残差(因此sum of squared residuals (SSR))是存在的:
import statsmodels.api as sm
A = np.array([[6, 6, 3], [40, 40, 20]]).T
y = np.array([0.5, 0.2, 0.6])
statsmodel_model = sm.OLS(y, A)
regression_results = statsmodels_model.fit()
calculated_r_squared = 1.0 - regression_results.ssr / np.sum((y)**2)
print("Parameters: " + str(regression_results.params))
print("Residuals: " + str(regression_results.resid))
print("Statsmodels R²: " + str(regression_results.rsquared))
print("Manually Calculated R²: " + str(calculated_r_squared))
以下内容生成以下输出:
Parameters: [0.00162999 0.01086661]
Residuals: [ 0.05555556 -0.24444444 0.37777778]
Statsmodels R²: 0.6837606837606838
Manually Calculated R²: 0.6837606837606838
(如您所见,Statsmodels和Numpy模型具有一致的参数。)
为什么Numpy用下面的例子返回一个空的SSR数组?这是numpy.linalg.lstsq的错误吗?如果这是一个bug,那么为什么Statsmodels能够计算sum of squared residuals (SSR),而numpy却不能?在给定最佳拟合平面的情况下,还可以手动清楚地计算残差:
来自^{} 的文档:
矩阵的秩是1。在
注意:您认为“缺少”的残差也可以使用
numpy
找到(您不需要其他软件包):相关问题 更多 >
编程相关推荐