Pandas使用np.where（）和iterrow（）填充缺少的数据…但是速度太慢了，请告诉我如何改进

>> before (example) day_time value on/off 2020-03-01 0:05 71.35 0 2020-03-01 0:06 68.425 0 2020-03-01 0:07 66.1 0 2020-03-01 0:08 64.125 0 2020-03-01 0:09 58.9 0 2020-03-01 0:10 56.075 0 2020-03-01 0:11 54.35 0 2020-03-01 0:12 57.025 1 2020-03-01 0:13 59.35 1 2020-03-01 0:14 63.2 1 2020-03-01 0:15 65.375 1 2020-03-01 0:16 66.35 1 2020-03-01 0:17 67.25 1 2020-03-01 0:18 70.05 1 2020-03-01 0:19 NaN NaN 2020-03-01 0:20 NaN NaN 2020-03-01 0:21 NaN NaN 2020-03-01 0:22 NaN NaN 2020-03-01 0:23 NaN NaN 2020-03-01 0:24 NaN NaN 2020-03-01 0:25 NaN NaN 2020-03-01 0:26 NaN NaN 2020-03-01 0:27 NaN NaN 2020-03-01 0:28 NaN NaN 2020-03-01 0:29 NaN NaN 2020-03-01 0:30 NaN NaN 2020-03-01 0:31 NaN NaN 2020-03-01 0:32 65.475 1 2020-03-01 0:33 65.475 1 2020-03-01 0:34 65.525 0

>> after (example) day_time value on/off 2020-03-01 0:05 71.35 0 2020-03-01 0:06 68.425 0 2020-03-01 0:07 66.1 0 2020-03-01 0:08 64.125 0 2020-03-01 0:09 58.9 0 2020-03-01 0:10 56.075 0 2020-03-01 0:11 54.35 0 2020-03-01 0:12 57.025 1 2020-03-01 0:13 59.35 1 2020-03-01 0:14 63.2 1 2020-03-01 0:15 65.375 1 2020-03-01 0:16 66.35 1 2020-03-01 0:17 67.25 1 2020-03-01 0:18 70.05 1 2020-03-01 0:19 68.05 0 2020-03-01 0:20 67.35 0 2020-03-01 0:21 65.21 0 2020-03-01 0:22 63.275 0 2020-03-01 0:23 65.225 0 2020-03-01 0:24 63.65 0 2020-03-01 0:25 61.45 0 2020-03-01 0:26 58.45 0 2020-03-01 0:27 56.275 0 2020-03-01 0:28 55.475 0 2020-03-01 0:29 54.3 0 2020-03-01 0:30 57.7 1 2020-03-01 0:31 59.5 1 2020-03-01 0:32 61.4 1 2020-03-01 0:33 63.5 1 2020-03-01 0:34 65.525 1

for i in result.iterrows(): result['pump'] = np.where(pd.isnull(result.pump), np.where((result.pump.shift(1) == 0) & (result.g_hight.shift(1) > 54), 0, result.pump), result.pump) result['pump'] = np.where(pd.isnull(result.pump), np.where((result.pump.shift(1) == 0) & (result.g_hight.shift(1) < 72), 1, result.pump), result.pump) result['pump'] = np.where(pd.isnull(result.pump), np.where((result.pump.shift(1) == 1) & (result.g_hight.shift(1) < 72), 1, result.pump), result.pump) result['pump'] = np.where(pd.isnull(result.pump), np.where((result.pump.shift(1) == 1) & (result.g_hight.shift(1) > 54), 0, result.pump), result.pump) value_ON = result['g_hight'].shift(1) - result['fi_usage'].shift(1) + 0.2503 value_OFF = (result['g_hight'].shift(1) - result['fi_usage'].shift(1)) result['g_hight'] = np.where((pd.isnull(result.g_hight)) & (pd.notna(result.pump)), np.where(result.pump == 0, value_OFF, value_ON), result.g_hight) result.to_csv('result_1.csv', index = False)

1条回答

网友

1楼 · 发布于 2024-06-06 12:28:32

这个问题有些模糊，可能需要做大量的工作，所以我在下面概述一下计划

我会从一个简单的模型开始，比如

value = a sin(bx + c) + d

为什么是正弦？因为它是周期性的，在增长和衰退之间波动，是一个很好的简单模型

我建议先估计b。它是整个循环的一个值。需要多长时间才能再次达到最大值？说，t时间。然后，b = 2 * pi / t

一旦b被确定，我推荐以下技巧：

value = a sin(bx + c) + d = A sin(bx) + B cos(bx) + C

如果我们知道b，那么我们就知道sin(bx)和cos(bx)，因此，我们所需要知道的就是A、B和C。使用已知值的回归可以找到它们。最后，应用该公式估计缺失值

相关问题更多 >

编程相关推荐

热门问题

热门文章