连续中断时如何重置序列

2024-03-29 08:18:25 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据帧,如下所示:

ID          Time
10789890    13:04:10
10778370    13:04:11
10778882    13:04:12
10783746    13:04:14
10783746    13:04:15
10780162    13:04:15
10780418    13:04:15
10777346    13:04:15
10779394    13:04:15
10782210    13:04:15
10781186    13:04:15
10776834    13:04:15
10788866    13:04:15
10788354    13:04:15
10783746    13:04:16
10788866    13:04:16
10781442    13:04:16
10788354    13:04:16
10789890    13:04:16
10782210    13:04:16
10793986    13:04:16
10780162    13:04:16
10778882    13:04:16
10789890    13:04:18
10788354    13:04:18
10783746    13:04:18

我有一个专栏叫《时间》。我想添加一个名为SEQ的列,其工作方式如下: 如果时间值是连续的,则序列应继续,但如果中断,则重置为1并继续。我希望我的输出如下:

ID          Time        SEQ
10789890    13:04:10    1
10778370    13:04:11    2
10778882    13:04:12    3
10783746    13:04:14    1
10783746    13:04:15    2
10780162    13:04:15    3
10780418    13:04:15    4
10777346    13:04:15    5
10779394    13:04:15    6
10782210    13:04:15    7
10781186    13:04:15    8
10776834    13:04:15    9
10788866    13:04:15    10
10788354    13:04:15    11
10783746    13:04:16    12
10788866    13:04:16    13
10781442    13:04:16    14
10788354    13:04:16    15
10789890    13:04:16    16
10782210    13:04:16    17
10793986    13:04:16    18
10780162    13:04:16    19
10778882    13:04:16    20
10789890    13:04:18    1
10788354    13:04:18    2
10783746    13:04:18    3

1条回答
网友
1楼 · 发布于 2024-03-29 08:18:25

您可以在转换^{}之后获取行之间的差异,然后使用^{}生成组并计算^{}

s = (pd.to_timedelta(df['Time']).diff().fillna(pd.Timedelta(hours=0))
                                  .dt.total_seconds().gt(1).cumsum())
df['SEQ']= df.groupby(s).cumcount().add(1)

print(df)

          ID      Time  SEQ
0   10789890  13:04:10    1
1   10778370  13:04:11    2
2   10778882  13:04:12    3
3   10783746  13:04:14    1
4   10783746  13:04:15    2
5   10780162  13:04:15    3
6   10780418  13:04:15    4
7   10777346  13:04:15    5
8   10779394  13:04:15    6
9   10782210  13:04:15    7
10  10781186  13:04:15    8
11  10776834  13:04:15    9
12  10788866  13:04:15   10
13  10788354  13:04:15   11
14  10783746  13:04:16   12
15  10788866  13:04:16   13
16  10781442  13:04:16   14
17  10788354  13:04:16   15
18  10789890  13:04:16   16
19  10782210  13:04:16   17
20  10793986  13:04:16   18
21  10780162  13:04:16   19
22  10778882  13:04:16   20
23  10789890  13:04:18    1
24  10788354  13:04:18    2
25  10783746  13:04:18    3

Note: If the Time column is already a timedelta , skip the pd.to_timedelta part and just use df['Time'].diff().fillna(.. instead of pd.to_timedelta(df['Time']).diff().fillna(...

相关问题 更多 >