我正在尝试根据日期填充pandas数据框中缺少的数据值
近似值范围为54.5到71.5。 当on/off为1时,该值增大;当on/off为0时,该值减小
>> before (example)
day_time value on/off
2020-03-01 0:05 71.35 0
2020-03-01 0:06 68.425 0
2020-03-01 0:07 66.1 0
2020-03-01 0:08 64.125 0
2020-03-01 0:09 58.9 0
2020-03-01 0:10 56.075 0
2020-03-01 0:11 54.35 0
2020-03-01 0:12 57.025 1
2020-03-01 0:13 59.35 1
2020-03-01 0:14 63.2 1
2020-03-01 0:15 65.375 1
2020-03-01 0:16 66.35 1
2020-03-01 0:17 67.25 1
2020-03-01 0:18 70.05 1
2020-03-01 0:19 NaN NaN
2020-03-01 0:20 NaN NaN
2020-03-01 0:21 NaN NaN
2020-03-01 0:22 NaN NaN
2020-03-01 0:23 NaN NaN
2020-03-01 0:24 NaN NaN
2020-03-01 0:25 NaN NaN
2020-03-01 0:26 NaN NaN
2020-03-01 0:27 NaN NaN
2020-03-01 0:28 NaN NaN
2020-03-01 0:29 NaN NaN
2020-03-01 0:30 NaN NaN
2020-03-01 0:31 NaN NaN
2020-03-01 0:32 65.475 1
2020-03-01 0:33 65.475 1
2020-03-01 0:34 65.525 0
我在缺失值出现时计算值, 我想填满它
我想计算它,以便它可以在71.5~54.5的范围内,在缺失值出现之前,通过值的变化量(平均值)反复增加或减少
>> after (example)
day_time value on/off
2020-03-01 0:05 71.35 0
2020-03-01 0:06 68.425 0
2020-03-01 0:07 66.1 0
2020-03-01 0:08 64.125 0
2020-03-01 0:09 58.9 0
2020-03-01 0:10 56.075 0
2020-03-01 0:11 54.35 0
2020-03-01 0:12 57.025 1
2020-03-01 0:13 59.35 1
2020-03-01 0:14 63.2 1
2020-03-01 0:15 65.375 1
2020-03-01 0:16 66.35 1
2020-03-01 0:17 67.25 1
2020-03-01 0:18 70.05 1
2020-03-01 0:19 68.05 0
2020-03-01 0:20 67.35 0
2020-03-01 0:21 65.21 0
2020-03-01 0:22 63.275 0
2020-03-01 0:23 65.225 0
2020-03-01 0:24 63.65 0
2020-03-01 0:25 61.45 0
2020-03-01 0:26 58.45 0
2020-03-01 0:27 56.275 0
2020-03-01 0:28 55.475 0
2020-03-01 0:29 54.3 0
2020-03-01 0:30 57.7 1
2020-03-01 0:31 59.5 1
2020-03-01 0:32 61.4 1
2020-03-01 0:33 63.5 1
2020-03-01 0:34 65.525 1
我试试下面
for i in result.iterrows():
result['pump'] = np.where(pd.isnull(result.pump), np.where((result.pump.shift(1) == 0) & (result.g_hight.shift(1) > 54), 0, result.pump), result.pump)
result['pump'] = np.where(pd.isnull(result.pump), np.where((result.pump.shift(1) == 0) & (result.g_hight.shift(1) < 72), 1, result.pump), result.pump)
result['pump'] = np.where(pd.isnull(result.pump), np.where((result.pump.shift(1) == 1) & (result.g_hight.shift(1) < 72), 1, result.pump), result.pump)
result['pump'] = np.where(pd.isnull(result.pump), np.where((result.pump.shift(1) == 1) & (result.g_hight.shift(1) > 54), 0, result.pump), result.pump)
value_ON = result['g_hight'].shift(1) - result['fi_usage'].shift(1) + 0.2503
value_OFF = (result['g_hight'].shift(1) - result['fi_usage'].shift(1))
result['g_hight'] = np.where((pd.isnull(result.g_hight)) & (pd.notna(result.pump)), np.where(result.pump == 0, value_OFF, value_ON), result.g_hight)
result.to_csv('result_1.csv', index = False)
它正在工作,但是。。太晚了。。 如何改进这个过程
这个问题有些模糊,可能需要做大量的工作,所以我在下面概述一下计划
我会从一个简单的模型开始,比如
为什么是正弦?因为它是周期性的,在增长和衰退之间波动,是一个很好的简单模型
我建议先估计
b
。它是整个循环的一个值。需要多长时间才能再次达到最大值?说,t
时间。然后,b = 2 * pi / t
一旦
b
被确定,我推荐以下技巧:如果我们知道
b
,那么我们就知道sin(bx)
和cos(bx)
,因此,我们所需要知道的就是A
、B
和C
。使用已知值的回归可以找到它们。最后,应用该公式估计缺失值相关问题 更多 >
编程相关推荐