<p>如果列<code>Timestamp</code>排序并包含每个月的所有天数,则解决方案:</p>
<p>您可以尝试先通过<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.cumsum.html" rel="nofollow">^{<cd2>}</a>查找数据组,然后通过<code>Serie</code>和聚合<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.cumcount.html" rel="nofollow">^{<cd5>}</a>查找{a2}。您得到<code>NaN</code>,因此<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.fillna.html" rel="nofollow">^{<cd7>}</a>由<code>0</code>获得,并通过<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.astype.html" rel="nofollow">^{<cd9>}</a>将输出列转换为整数:</p>
<pre><code>#reverse ordering
df = df[::-1]
print (df.Status == 'FAIL').astype(int).cumsum()
5 1
4 1
3 1
2 2
1 2
0 2
Name: Status, dtype: int32
#filter and get ordering of colums
df['Days_until_next_fail'] = df[df.Status=='OK']
.groupby((df.Status == 'FAIL').astype(int).cumsum())
.cumcount() + 1
#replace NaN by 0, convert values to integer
df['Days_until_next_fail'] = df['Days_until_next_fail'].fillna(0).astype(int)
#ordering to original
df.sort_index(inplace=True)
print df
Timestamp Status Days_until_next_fail
0 2012-01-01 OK 2
1 2012-01-02 OK 1
2 2012-01-03 FAIL 0
3 2012-01-05 OK 2
4 2012-01-06 OK 1
5 2012-01-07 FAIL 0
</code></pre>
<p>更一般的解决方案(所有日期都必须排序):</p>
^{pr2}$
<p>如果需要将列从<code>timedelta</code>转换为<code>int</code>:</p>
<pre><code>df['fail_days'] = df.groupby((df.Status == 'FAIL').astype(int).cumsum())
.apply(lambda x: ((x.iloc[0][0] - x.Timestamp) / np.timedelta64(1, 'D'))
.astype(int))
.reset_index(level=0, drop=True)
print df.sort_index()
Timestamp Status fail_days
0 2011-12-28 OK 6
1 2012-01-02 OK 1
2 2012-01-03 FAIL 0
3 2012-01-05 OK 2
4 2012-01-06 OK 1
5 2012-01-07 FAIL 0
</code></pre>