<p>我找不到一种矢量化的方法来在每次达到阈值时重置为0</p>
<p>但是Pandas列的底层容器是一个numpy数组,迭代一个numpy数组需要一段可接受的时间。因此,我想:</p>
<pre><code>arr = np.zeros(len(df), dtype='int')
cum = 0
src = df['V'].values
dt = df['myDate'].values
start = 0
for i in range(len(df)):
cum += src[i]
while dt[start] < dt[i] - np.timedelta64(4, 'm'):
cum -= src[start]
start +=1
arr[i] = cum
if cum >=3:
cum = 0
start = i
df['desired_column'] = arr
</code></pre>
<p>它给出:</p>
<pre><code> myDate V rolling desired_column
1 2020-04-01 10:00:00 0 0 0
2 2020-04-01 10:01:00 1 1 1
3 2020-04-01 10:02:00 2 3 3
4 2020-04-01 10:03:00 1 4 1
5 2020-04-01 10:04:00 0 4 1
6 2020-04-01 10:05:00 4 7 5
7 2020-04-01 10:06:00 1 6 1
8 2020-04-01 10:07:00 1 6 2
9 2020-04-01 10:08:00 0 6 2
10 2020-04-01 10:09:00 3 5 5
</code></pre>
<p>在我的i5机器上,长度为1000000的数组只需要几秒钟(10000的数组大约需要90秒)</p>