当p>n时，sklearn如何进行线性回归？

In [30]: lm = LinearRegression().fit(xx,y_train) In [31]: lm.coef_ Out[31]: array([[ 0.20092363, -0.14378298, -0.33504391, ..., -0.40695124, 0.08619906, -0.08108713]]) In [32]: xx.shape Out[32]: (1097, 3419)

if n > m: # need to extend b matrix as it will be filled with # a larger solution matrix if len(b1.shape) == 2: b2 = np.zeros((n, nrhs), dtype=gelss.dtype) b2[:m,:] = b1 else: b2 = np.zeros(n, dtype=gelss.dtype) b2[:m] = b1 b1 = b2

1条回答

网友

1楼 · 发布于 2024-04-26 09:46:06

当线性系统欠定时，则sklearn.linear_model.LinearRegression找到最小L2范数解，即

argmin_w l2_norm(w) subject to Xw = y

这总是定义得很好，并且可以通过将X的伪逆应用于y来获得，即

w = np.linalg.pinv(X).dot(y)

LinearRegression使用的scipy.linalg.lstsq的具体实现使用了get_lapack_funcs(('gelss',), ...，这正是一个通过奇异值分解（由LAPACK提供）找到最小范数解的解算器

看看这个例子

import numpy as np
rng = np.random.RandomState(42)
X = rng.randn(5, 10)
y = rng.randn(5)

from sklearn.linear_model import LinearRegression
lr = LinearRegression(fit_intercept=False)
coef1 = lr.fit(X, y).coef_
coef2 = np.linalg.pinv(X).dot(y)

print(coef1)
print(coef2)

你会看到coef1 == coef2。（注意fit_intercept=False在sklearn估计器的构造函数中指定，因为否则它将在拟合模型之前减去每个特征的平均值，从而产生不同的系数）

相关问题更多 >

编程相关推荐

热门问题

热门文章