如何为每行pandas数据框训练线性回归并生成斜率
我创建了一个这样的 pandas 数据框:
import numpy as np
import pandas as pd
ds = {'col1' : [11,22,33,24,15,6,7,68,79,10,161,12,113,147,115]}
df = pd.DataFrame(data=ds)
predFeature = []
for i in range(len(df)):
predFeature.append(0)
predFeature[i] = predFeature[i-1]+1
df['predFeature'] = predFeature
arrayTarget = []
arrayPred = []
target = np.array(df['col1'])
predFeature = np.array(df['predFeature'])
for i in range(len(df)):
arrayTarget.append(target[i-4:i])
arrayPred.append(predFeature[i-4:i])
df['arrayTarget'] = arrayTarget
df['arrayPred'] = arrayPred
看起来是这样的:
col1 predFeature arrayTarget arrayPred
0 11 1 [] []
1 22 2 [] []
2 33 3 [] []
3 24 4 [] []
4 15 5 [11, 22, 33, 24] [1, 2, 3, 4]
5 6 6 [22, 33, 24, 15] [2, 3, 4, 5]
6 7 7 [33, 24, 15, 6] [3, 4, 5, 6]
7 68 8 [24, 15, 6, 7] [4, 5, 6, 7]
8 79 9 [15, 6, 7, 68] [5, 6, 7, 8]
9 10 10 [6, 7, 68, 79] [6, 7, 8, 9]
10 161 11 [7, 68, 79, 10] [7, 8, 9, 10]
11 12 12 [68, 79, 10, 161] [8, 9, 10, 11]
12 113 13 [79, 10, 161, 12] [9, 10, 11, 12]
13 147 14 [10, 161, 12, 113] [10, 11, 12, 13]
14 115 15 [161, 12, 113, 147] [11, 12, 13, 14]
我需要生成一个新的列,叫做 slope
,这个列的值是针对每一行进行线性回归后得到的系数,具体来说:
- 目标值 = 每个包含在
arrayTarget
中的数组 - 预测特征 = 每个包含在
arrayPred
中的数组
举个例子:
前四行的
slope
是null
。第五行的 slope 是通过线性回归计算得到的,考虑以下值:
- 自变量(或预测值):
[1, 2, 3, 4]
- 因变量(或被预测值):
[11, 22, 33, 24]
结果是:0.10204081632653061
。
- 自变量(或预测值):
第六行的 slope 是通过线性回归计算得到的,考虑以下值:
- 自变量(或预测值):
[2, 3, 4, 5]
- 因变量(或被预测值):
[22, 33, 24, 15]
结果是:-0.09090909090909091
。
- 自变量(或预测值):
依此类推。
有人能帮我吗?
1 个回答
1
你可以定义一个函数,使用 sklearn.linear_model.LinearRegression
这个工具,然后在数据的每一行上应用这个函数。不过,如果你的数据表太大,这样做可能会效率不高。
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
def calculate_slope(x, y):
if len(x) < 1:
return np.nan
lr.fit(x.reshape(-1, 1), y)
return lr.coef_[0]
df["slope"] = df.apply(
lambda x: calculate_slope(x["arrayTarget"], x["arrayPred"]), axis=1
)
col1 predFeature arrayTarget arrayPred slope
0 11 1 [] [] NaN
1 22 2 [] [] NaN
2 33 3 [] [] NaN
3 24 4 [] [] NaN
4 15 5 [11, 22, 33, 24] [1, 2, 3, 4] 0.102041
5 6 6 [22, 33, 24, 15] [2, 3, 4, 5] -0.090909
6 7 7 [33, 24, 15, 6] [3, 4, 5, 6] -0.111111
7 68 8 [24, 15, 6, 7] [4, 5, 6, 7] -0.142857
8 79 9 [15, 6, 7, 68] [5, 6, 7, 8] 0.030418
9 10 10 [6, 7, 68, 79] [6, 7, 8, 9] 0.030769
10 161 11 [7, 68, 79, 10] [7, 8, 9, 10] 0.002331
11 12 12 [68, 79, 10, 161] [8, 9, 10, 11] 0.009048
12 113 13 [79, 10, 161, 12] [9, 10, 11, 12] -0.001640
13 147 14 [10, 161, 12, 113] [10, 11, 12, 13] 0.004698
14 115 15 [161, 12, 113, 147] [11, 12, 13, 14] 0.002174