在时间序列python中计算事件的持续时间

index value 2003-01-01 00:00:00 14.5 2003-01-01 01:00:00 15.8 2003-01-01 02:00:00 0 2003-01-01 03:00:00 0 2003-01-01 04:00:00 13.6 2003-01-01 05:00:00 4.3 2003-01-01 06:00:00 13.7 2003-01-01 07:00:00 14.4 2003-01-01 08:00:00 0 2003-01-01 09:00:00 0 2003-01-01 10:00:00 0 2003-01-01 11:00:00 17.2 2003-01-01 12:00:00 0 2003-01-01 13:00:00 5.3 2003-01-01 14:00:00 0 2003-01-01 15:00:00 2.0 2003-01-01 16:00:00 4.0 2003-01-01 17:00:00 0 2003-01-01 18:00:00 0 2003-01-01 19:00:00 3.9 2003-01-01 20:00:00 7.2 2003-01-01 21:00:00 1.0 2003-01-01 22:00:00 1.0 2003-01-01 23:00:00 10.0

2条回答

网友

1楼 · 编辑于 2024-05-23 19:30:56

我不太清楚你要什么。但是，我想你要的是resample()。如果我误解了你的问题，请纠正我。在

从Creating pandas dataframe with datetime index and random values in column开始，我创建了一个随机时间序列数据帧。在

import pandas as pd
import numpy as np
from datetime import datetime, timedelta

date_today = datetime.now()
days = pd.date_range(date_today, date_today + timedelta(1), freq='H')

np.random.seed(seed=1111)
data = np.random.randint(1, high=100, size=len(days))
df = pd.DataFrame({'Day': days, 'Value': data})
df = df.set_index('Day')

查看数据帧

^{pr2}$

现在，重新采样数据帧

^{3}$

它给了你

Day                 Value
2018-03-18 20:00:00 42.5
2018-03-18 22:00:00 47.5
2018-03-19 00:00:00 44.0
2018-03-19 02:00:00 24.0
2018-03-19 04:00:00 16.5
2018-03-19 06:00:00 12.0
2018-03-19 08:00:00 50.0
2018-03-19 10:00:00 36.0
2018-03-19 12:00:00 57.0
2018-03-19 14:00:00 60.0
2018-03-19 16:00:00 87.0
2018-03-19 18:00:00 41.0
2018-03-19 20:00:00 78.0

类似地，你可以重新取样到天，小时，分钟等我留给你。你可能需要看看

网友

2楼 · 编辑于 2024-05-23 19:30:56

我相信这就是你要找的。我为每一步的代码添加了解释。在

# create helper columns defining contiguous blocks and day
df['block'] = (df['value'].astype(bool).shift() != df['value'].astype(bool)).cumsum()
df['day'] = df['index'].dt.normalize()

# group by day to get unique block count and value count
session_map = df[df['value'].astype(bool)].groupby('day')['block'].nunique()
hour_map = df[df['value'].astype(bool)].groupby('day')['value'].count()

# map to original dataframe
df['sessions'] = df['day'].map(session_map)
df['hours'] = df['day'].map(hour_map)

# calculate result
res = df.groupby(['day', 'hours', 'sessions'], as_index=False)['value'].sum()
res['duration'] = res['hours'] / res['sessions']
res['amount'] = res['value'] / res['sessions']

结果

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章