获取Pandas不同时期的开始和结束时间

2024-04-20 10:15:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这个数据框

timestamp            Val1 
2020-04-02 06:44:00  NaN    
2020-04-03 16:52:00  NaN
2020-04-03 16:53:00  NaN
2020-04-03 16:54:00  NaN
2020-04-03 16:55:00  NaN
2020-04-17 02:03:00  NaN
2020-04-17 02:04:00  NaN
2020-04-17 02:05:00  NaN
2020-04-17 02:06:00  NaN

我试着用分钟的顺序分组。例如,我无法将时间超过1分钟且存在差异的行分组。 因此,输出将如下所示:

#Group 1
timestamp            Val1
2020-04-02 06:44:00  NaN

#Group 2
timestamp            Val1
2020-04-03 16:52:00  NaN
2020-04-03 16:53:00  NaN
2020-04-03 16:54:00  NaN
2020-04-03 16:55:00  NaN


#Group 3
timestamp            Val1             
2020-04-17 02:03:00  NaN
2020-04-17 02:04:00  NaN
2020-04-17 02:05:00  NaN
2020-04-17 02:06:00  NaN

现在,我可以用所有的数据得到最小值和最大值。但我不喜欢我想尝试的


Tags: 数据顺序时间group差异nantimestampval1
1条回答
网友
1楼 · 发布于 2024-04-20 10:15:03

获取连续行之间的差异,并检查其是否高于所需的差异('1min')。获取此布尔序列的cumsum将创建分组标签。我已将其分配到此处的一列中进行说明

#df['timestamp'] = pd.to_datetime(df['timestamp'])
df['group'] = df['timestamp'].diff().gt('1min').cumsum()

            timestamp  Val1  group
0 2020-04-02 06:44:00   NaN      0
1 2020-04-03 16:52:00   NaN      1
2 2020-04-03 16:53:00   NaN      1
3 2020-04-03 16:54:00   NaN      1
4 2020-04-03 16:55:00   NaN      1
5 2020-04-17 02:03:00   NaN      2
6 2020-04-17 02:04:00   NaN      2
7 2020-04-17 02:05:00   NaN      2
8 2020-04-17 02:06:00   NaN      2

相关问题 更多 >