使用Pandas使用过滤器计算值之间的差异问题的回答

使用Pandas使用过滤器计算值之间的差异

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

如果列<code>Timestamp</code>排序并包含每个月的所有天数，则解决方案： 您可以尝试先通过<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.cumsum.html" rel="nofollow">^{<cd2>}</a>查找数据组，然后通过<code>Serie</code>和聚合<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.cumcount.html" rel="nofollow">^{<cd5>}</a>查找{a2}。您得到<code>NaN</code>，因此<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.fillna.html" rel="nofollow">^{<cd7>}</a>由<code>0</code>获得，并通过<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.astype.html" rel="nofollow">^{<cd9>}</a>将输出列转换为整数： <pre><code>#reverse ordering df = df[::-1] print (df.Status == 'FAIL').astype(int).cumsum() 5 1 4 1 3 1 2 2 1 2 0 2 Name: Status, dtype: int32 #filter and get ordering of colums df['Days_until_next_fail'] = df[df.Status=='OK'] .groupby((df.Status == 'FAIL').astype(int).cumsum()) .cumcount() + 1 #replace NaN by 0, convert values to integer df['Days_until_next_fail'] = df['Days_until_next_fail'].fillna(0).astype(int) #ordering to original df.sort_index(inplace=True) print df Timestamp Status Days_until_next_fail 0 2012-01-01 OK 2 1 2012-01-02 OK 1 2 2012-01-03 FAIL 0 3 2012-01-05 OK 2 4 2012-01-06 OK 1 5 2012-01-07 FAIL 0 </code></pre> 更一般的解决方案（所有日期都必须排序）： ^{pr2}$ 如果需要将列从<code>timedelta</code>转换为<code>int</code>： <pre><code>df['fail_days'] = df.groupby((df.Status == 'FAIL').astype(int).cumsum()) .apply(lambda x: ((x.iloc[0][0] - x.Timestamp) / np.timedelta64(1, 'D')) .astype(int)) .reset_index(level=0, drop=True) print df.sort_index() Timestamp Status fail_days 0 2011-12-28 OK 6 1 2012-01-02 OK 1 2 2012-01-03 FAIL 0 3 2012-01-05 OK 2 4 2012-01-06 OK 1 5 2012-01-07 FAIL 0 </code></pre>

使用Pandas使用过滤器计算值之间的差异

1 个回答

相关Python问题