使用matplotlib/numpy进行线性回归
我正在尝试在我生成的散点图上进行线性回归,但是我的数据是以列表的形式存在的,而我找到的所有使用 polyfit
的例子都需要使用 arange
。可是 arange
不接受列表。我到处查找如何将列表转换为数组,但没有找到清晰的答案。我是不是漏掉了什么?
接下来,我该如何将我的整数列表作为输入传递给 polyfit
呢?
这是我正在参考的 polyfit
示例:
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(data)
y = np.arange(data)
m, b = np.polyfit(x, y, 1)
plt.plot(x, y, 'yo', x, m*x+b, '--k')
plt.show()
8 个回答
20
使用 statsmodels.api.OLS
可以详细了解模型的拟合情况、系数和残差:
import statsmodels.api as sm
df = sm.datasets.get_rdataset('Duncan', 'carData').data
y = df['income']
x = df['education']
model = sm.OLS(y, sm.add_constant(x))
results = model.fit()
print(results.params)
# const 10.603498 <- intercept
# education 0.594859 <- slope
# dtype: float64
print(results.summary())
# OLS Regression Results
# ==============================================================================
# Dep. Variable: income R-squared: 0.525
# Model: OLS Adj. R-squared: 0.514
# Method: Least Squares F-statistic: 47.51
# Date: Thu, 28 Apr 2022 Prob (F-statistic): 1.84e-08
# Time: 00:02:43 Log-Likelihood: -190.42
# No. Observations: 45 AIC: 384.8
# Df Residuals: 43 BIC: 388.5
# Df Model: 1
# Covariance Type: nonrobust
# ==============================================================================
# coef std err t P>|t| [0.025 0.975]
# ------------------------------------------------------------------------------
# const 10.6035 5.198 2.040 0.048 0.120 21.087
# education 0.5949 0.086 6.893 0.000 0.421 0.769
# ==============================================================================
# Omnibus: 9.841 Durbin-Watson: 1.736
# Prob(Omnibus): 0.007 Jarque-Bera (JB): 10.609
# Skew: 0.776 Prob(JB): 0.00497
# Kurtosis: 4.802 Cond. No. 123.
# ==============================================================================
matplotlib 3.5.0 新特性
要绘制最佳拟合线,只需将斜率 m
和截距 b
传入新的 plt.axline
中:
import matplotlib.pyplot as plt
# extract intercept b and slope m
b, m = results.params
# plot y = m*x + b
plt.axline(xy1=(0, b), slope=m, label=f'$y = {m:.1f}x {b:+.1f}$')
注意,斜率 m
和截距 b
可以很容易地从常见的回归方法中提取出来:
-
import numpy as np m, b = np.polyfit(x, y, deg=1) plt.axline(xy1=(0, b), slope=m, label=f'$y = {m:.1f}x {b:+.1f}$')
-
from scipy import stats m, b, *_ = stats.linregress(x, y) plt.axline(xy1=(0, b), slope=m, label=f'$y = {m:.1f}x {b:+.1f}$')
-
import statsmodels.api as sm b, m = sm.OLS(y, sm.add_constant(x)).fit().params plt.axline(xy1=(0, b), slope=m, label=f'$y = {m:.1f}x {b:+.1f}$')
sklearn.linear_model.LinearRegression
from sklearn.linear_model import LinearRegression reg = LinearRegression().fit(x[:, None], y) b = reg.intercept_ m = reg.coef_[0] plt.axline(xy1=(0, b), slope=m, label=f'$y = {m:.1f}x {b:+.1f}$')
43
这段代码:
from scipy.stats import linregress
linregress(x,y) #x and y are arrays or lists.
会输出一个包含以下内容的列表:
slope : float
回归线的斜率
intercept : float
回归线的截距
r-value : float
相关系数
p-value : float
一个假设检验的双侧p值,原假设是斜率为零
stderr : float
估计的标准误差
231
arange
是用来生成列表的(其实是生成 numpy 数组);想了解更多细节,可以输入 help(np.arange)
。你不需要在已有的列表上调用它。
>>> x = [1,2,3,4]
>>> y = [3,5,7,9]
>>>
>>> m,b = np.polyfit(x, y, 1)
>>> m
2.0000000000000009
>>> b
0.99999999999999833
我还想补充一下,我通常会使用 poly1d
,而不是直接写出 "m*x+b" 这种形式和更高阶的公式,所以我写的代码大概是这样的:
import numpy as np
import matplotlib.pyplot as plt
x = [1,2,3,4]
y = [3,5,7,10] # 10, not 9, so the fit isn't perfect
coef = np.polyfit(x,y,1)
poly1d_fn = np.poly1d(coef)
# poly1d_fn is now a function which takes in x and returns an estimate for y
plt.plot(x,y, 'yo', x, poly1d_fn(x), '--k') #'--k'=black dashed line, 'yo' = yellow circle marker
plt.xlim(0, 5)
plt.ylim(0, 12)