Sklearn LinearRegression()不需要迭代和学习率参数

1 投票

3 回答

104 浏览

提问于 2025-04-14 18:22

据我所知，使用梯度下降算法可以通过更新权重来最小化成本（重复这个过程直到收敛）。在进行线性回归时，我们有：

m : slope

c : intercept (constant value)

import numpy as np
from sklearn.linear_model import LinearRegression

x= np.asarray([(i+np.random.randint(1,7)) for i in range(1,31)]).reshape(-1,1)

y= np.dot([3],x.T) + 5

reg = LinearRegression()

reg.fit(x,y)

我使用了sklearn这个库，但在这里我们在初始化时并没有输入迭代次数和学习率，也没有在调用reg.fit()时进行设置。

为什么sklearn的线性回归不要求输入迭代次数和学习率？是用了一些默认值，还是采用了其他方法？

机器学习默认参数 sklearn 梯度下降线性回归成本最小化

3 个回答

正如其他人所说，通常情况下，我们不会使用迭代的方法；不过，在输入是稀疏矩阵的情况下，确实会这样做。目前，sklearn主要依赖于scipy的稀疏求解器lsqr的默认参数。问题24601建议将一些这些参数添加到sklearn的API中。

对于密集和稀疏输入的其他优化建议：
https://github.com/scikit-learn/scikit-learn/issues/22855
https://github.com/scikit-learn/scikit-learn/issues/23199
https://github.com/scikit-learn/scikit-learn/issues/14268

回答于 2025-04-14 由 Python大师

分享举报

我想给Muhammed的回答加个例子（你应该接受他的回答，顺便提一下），这里有个示例

import numpy as np
np.random.seed(12) # Just to have a reproducible example
from sklearn.linear_model import LinearRegression
X=np.random.normal(0,1,(20,10)) # Or any array of X you want. Just a mre
Y=np.random.normal(0,1,(20,))

# Learning with LinearRegerssion, without intercept (so learning M, Y=LX, L begin an array of coefficients)
reg=LinearRegression(fit_intercept=False)
reg.fit(X,Y)
print(reg.coef_) # The coefficients. 
# With my random example and seed, shows
#[-0.06999151 -0.0586993   0.77203288  0.11928812  0.05656448 -0.37281412
# -0.35447307  0.06957882  0.26701851  0.06950227]
# Meaning that model is to predict that Y=-0.06999151*X₀ -0.0586993*X₁ + ...

# For example prediction for a given X
Xtest=np.random.randint(-5,5, (1,10)) # A single sample of 10 features
reg.predict(Xtest)
# returns 4.49749641
# Which is simply
sum(Xtest[0,i]*reg.coef_[i] for i in range(10))
# Or, using linear algebra operation
reg.coef_@Xtest[0]

# Now, Moore-Penrose's version
Coef = np.linalg.inv(X.T@X)@X.T@Y
print(Coef)
#[-0.06999151 -0.0586993   0.77203288  0.11928812  0.05656448 -0.37281412
# -0.35447307  0.06957882  0.26701851  0.06950227]
# See, same coefficients! Not "approximately the same". But the same... 
# including the non significative decimal places, where you would expect some
# numerical error. Showing that it is really the same computation done, not an equivalent one

# prediction is likewise
Coef@Xtest[0]

其实没什么神秘的。线性回归就是一种叫做Moore-Penrose伪逆的东西。也可以理解为最小平方值。再简单点说，就是正交投影（这两者是一样的：在某个子空间Vec(X₁,X₂,...)中，点P与X的距离‖X-P‖最小的情况，就是X在这个子空间上的正交投影）。

即使你对子空间、Vec、Moore-Penrose这些概念没有什么印象（我说“印象”是因为，如果你在做这些事情，你可能在某个时候上过数学课；这些都是世界上任何科学课程都会教的东西……但大多数人很快就会忘记），至少你可以看到这不是一个迭代的过程。它只是一个公式 (XᵀX)⁻¹XᵀY。

我在这里简化了我的例子，因为我去掉了截距项。但截距项其实就是一个额外的“1”向量的系数。

X1=np.ones((20,11))
X1[:,:10]=X
CoefI = np.linalg.inv(X1.T@X1)@X1.T@Y
# Returns
# array([-0.1068548 , -0.09027332,  0.73712907,  0.1136123 ,  0.0904737 ,
#       -0.36593051, -0.38649945,  0.02849317,  0.18063291,  0.05866195,
#       -0.17597287])
regI=LinearRegression()
regI.fit(X,Y)
regI.coef_
#array([-0.1068548 , -0.09027332,  0.73712907,  0.1136123 ,  0.0904737 ,
#       -0.36593051, -0.38649945,  0.02849317,  0.18063291,  0.05866195])
# aka the 10 first coefficients (the one apply to the 10 "real" columns of X)
regI.intercept_
#-0.17597287204667314
# aka the 11th coefficient of Moore-Penrose's inverse. That is the one
# apply to the "all 1" vector. 

# Comparison of prediction is almost as easy
regI.predict(Xtest)
CoefI[:10]@Xtest[0]+CoefI[10]
# both return the same 4.633604110000001

所以，即使有截距，它仍然只是一个线性代数公式，而不是一个迭代的过程。

也许sklearn更高效。但在正常大小的数据集上，这一点并不明显（在像我这个20×10的小例子中，直接用Moore-Penrose的计算速度快10倍。但这可能只是因为类初始化的开销。不过即使在像2000×1000这样的大数据集上——虽然也不算特别大——Moore-Penrose仍然快3倍。也许是因为sklearn确保了一些更好的条件。或者它在处理更大且稀疏的数据集时表现更好。我也不知道）。从数学的角度来看，它并没有做比Moore-Penrose伪逆更多的事情。从实现的角度来看，很难举出它做得更好的例子（它并不更快，我也无法生成它更稳定的例子）。

回答于 2025-04-14 由 Python大师

分享举报

LinearRegression() 这个方法不使用梯度下降，所以它没有 learning_rate 这个参数。它是通过一个数学公式直接计算出最佳的系数。换句话说，它有一个封闭解，所以不需要像梯度下降那样的迭代求解器。

梯度下降是一种更通用的方法，可以处理更复杂的问题。不过，你仍然可以用它来做线性回归——可以看看这个链接里的 SGDRegressor()。

LinearRegression 和 SGDRegressor(penalty=None) 都能找到你所描述的线性模型的最佳权重。它们的区别在于如何到达这个结果。前者是直接计算出一个精确的解，而后者则是沿着最陡的梯度区域逐步接近全局最优解。

在某些情况下，你可能需要选择一个求解器而不是另一个。例如，如果你的数据集太大，无法全部放入内存，那么你就不能使用 LinearRegression，因为它需要同时查看所有数据。而 SGDRegressor 就可以，因为梯度下降可以把数据分成小批量，一次处理一小部分，逐步找到解决方案。

回答于 2025-04-14 由 Python大师

分享举报

Sklearn LinearRegression()不需要迭代和学习率参数

3 个回答

撰写回答