在pandas中计算事件之间的时间差
+--------------------------------------------------------------+
| 2014-08-12T10:30:14.6938893+10:00 Reading received START |
| 2014-08-12T10:30:14.6938893+10:00 Reading received ADD |
| 2014-08-12T10:30:14.7094893+10:00 Reading received UPDATE |
| 2014-08-12T10:30:14.7094893+10:00 Reading received COMMIT |
| 2014-08-12T10:30:14.7094893+10:00 Commit start |
| 2014-08-12T10:30:14.7406893+10:00 Commit end |
| 2014-08-12T10:30:14.7406893+10:00 Reading received FINISH |
| 2014-08-12T10:30:23.3206893+10:00 Reading received START |
| 2014-08-12T10:30:23.3206893+10:00 Reading received ADD |
| 2014-08-12T10:30:23.3362893+10:00 Reading received UPDATE |
| 2014-08-12T10:30:23.3362893+10:00 Reading received COMMIT |
| 2014-08-12T10:30:23.3362893+10:00 Commit start |
| 2014-08-12T10:30:23.3674893+10:00 Commit end |
| 2014-08-12T10:30:23.3674893+10:00 Reading received FINISH |
+--------------------------------------------------------------+
假设你有一串时间序列数据,这些数据记录了一些事件的发生时间。现在你想要计算这些事件之间的时间差,比如说,想知道开始接收读数和随后的完成接收读数之间的平均时间差。
有没有比这样做更好的方法呢?
left = df[df.Event == 'Reading received START']
right = df[df.Event == 'Reading received FINISH']
left.index = range(len(left))
right.index = range(len(right))
delta = (right.Time - left.Time)
1 个回答
2
为了更清楚,我假设你是在展示一个更大数据表中的索引和一列(叫做“事件”)。这样理解对吗?
那你可以试试下面这个:
relevant_df = df[df.Event.isin(['Reading received START','Reading received START'])
relevant_ts_as_series = pd.Series(relevant_df.index)
diff = relevant_ts_as_series - relevant_ts_as_series.shift()
如果你愿意的话,可以使用 diff.mean()
。
我敢打赌,有比把索引变成一个序列更优雅的方法,不过这个方法应该能满足你的需求。