在pandas中计算事件之间的时间差

1 投票
1 回答
1533 浏览
提问于 2025-04-18 17:15
+--------------------------------------------------------------+
| 2014-08-12T10:30:14.6938893+10:00     Reading received START |
| 2014-08-12T10:30:14.6938893+10:00       Reading received ADD |
| 2014-08-12T10:30:14.7094893+10:00    Reading received UPDATE |
| 2014-08-12T10:30:14.7094893+10:00    Reading received COMMIT |
| 2014-08-12T10:30:14.7094893+10:00               Commit start |
| 2014-08-12T10:30:14.7406893+10:00                 Commit end |
| 2014-08-12T10:30:14.7406893+10:00    Reading received FINISH |
| 2014-08-12T10:30:23.3206893+10:00     Reading received START |
| 2014-08-12T10:30:23.3206893+10:00       Reading received ADD |
| 2014-08-12T10:30:23.3362893+10:00    Reading received UPDATE |
| 2014-08-12T10:30:23.3362893+10:00    Reading received COMMIT |
| 2014-08-12T10:30:23.3362893+10:00               Commit start |
| 2014-08-12T10:30:23.3674893+10:00                 Commit end |
| 2014-08-12T10:30:23.3674893+10:00    Reading received FINISH |
+--------------------------------------------------------------+

假设你有一串时间序列数据,这些数据记录了一些事件的发生时间。现在你想要计算这些事件之间的时间差,比如说,想知道开始接收读数和随后的完成接收读数之间的平均时间差。

有没有比这样做更好的方法呢?

left = df[df.Event == 'Reading received START']
right = df[df.Event == 'Reading received FINISH']
left.index = range(len(left))
right.index = range(len(right))
delta = (right.Time - left.Time)

1 个回答

2

为了更清楚,我假设你是在展示一个更大数据表中的索引和一列(叫做“事件”)。这样理解对吗?

那你可以试试下面这个:

relevant_df = df[df.Event.isin(['Reading received START','Reading received START'])
relevant_ts_as_series = pd.Series(relevant_df.index)
diff = relevant_ts_as_series - relevant_ts_as_series.shift()

如果你愿意的话,可以使用 diff.mean()

我敢打赌,有比把索引变成一个序列更优雅的方法,不过这个方法应该能满足你的需求。

撰写回答