创建滚动求和列，该列在达到阈值后重置问题的回答

创建滚动求和列，该列在达到阈值后重置

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

下面的方法无论如何都不能节省内存，但它应该比循环更快。它假定时间是连续的，以便委托给numpy方法，否则可以在调用之前包含缺少的时间 <pre><code>def rolling_window(a, window): b = np.concatenate((np.zeros(window-1), a)) # only for 1d return np.array([b[..., i:i+window] for i in range(a.size)]) def dynamic_window(w: np.array, reset): regions = np.hstack([ np.zeros((w.shape[0], 1)), np.cumsum(w, axis=-1)[:, :-1] ]) // reset return w * (regions == regions[:, -1][:, np.newaxis]) </code></pre> 把它当作 <pre><code># sample df # please always provide a callable line of code # you could get it with `df.head(10).to_dict('split')` df = pd.DataFrame({ 'myDate': pd.date_range('2020-04-01 10:00', periods=10, freq='T'), 'V': [0, 1, 2, 1, 0, 4, 1, 1, 0, 3] }) # include all time increments df = pd.concat([ df, pd.DataFrame(pd.date_range(df['myDate'].min(), df['myDate'].max(), freq='T'), columns=['myDate']) ]).drop_duplicates(subset=['myDate']).fillna(0).sort_values('myDate') df['4min_sum'] = df.rolling('4min', on='myDate')['V'].sum() # use the functions df['desired_column'] = dynamic_window( rolling_window(df['V'].to_numpy(), 4), 3).sum(axis=-1) </code></pre> 输出 <pre><code> myDate V 4min_sum desired_column 0 2020-04-01 10:00:00 0.0 0.0 0.0 1 2020-04-01 10:01:00 1.0 1.0 1.0 2 2020-04-01 10:02:00 2.0 3.0 3.0 3 2020-04-01 10:03:00 1.0 4.0 1.0 4 2020-04-01 10:04:00 0.0 4.0 1.0 5 2020-04-01 10:05:00 4.0 7.0 4.0 6 2020-04-01 10:06:00 1.0 6.0 1.0 7 2020-04-01 10:07:00 1.0 6.0 2.0 8 2020-04-01 10:08:00 0.0 6.0 0.0 9 2020-04-01 10:09:00 3.0 5.0 5.0 </code></pre> 请注意，在10:05时，它如何输出4，而不是预期输出中的5。根据你的逻辑，应该是4；该窗口包含<code>[2, 1, 0, 4]</code>，由于前两个数字和为3，因此该窗口应重置并返回0+4

创建滚动求和列，该列在达到阈值后重置

1 个回答

相关Python问题