如何使用python对数据帧中的每个连续夜晚进行编号

2024-04-18 20:23:55 发布

您现在位置:Python中文网/ 问答频道 /正文

在Python 3x中,我有一些填充列表,我将它们放在一个数据框中,以“ID”和“Time”作为列:

import pandas as pd
df = pd.DataFrame({'ID': ID, 'UTCTime': UTCTime})
print(df)

ID              UTCTime
3    4  2021-04-03 21:56:53
4    5  2021-04-03 21:56:55
5    6  2021-04-03 21:56:57
6    7  2021-04-03 22:00:07
7    8  2021-04-03 22:00:09
8    9  2021-04-03 22:00:11
9   10  2021-04-03 22:05:08
10  11  2021-04-03 22:05:10
... 

这就是数据的样子,持续了几个月。时间戳之间的间隔不相等

但现在我想添加一个额外的列“NightID”,标识每个连续的夜晚,并给它一个唯一的编号。因此,选择18:00到06:00之间的时间,将其称为“1”,并在第二天晚上将其称为“2”,以此类推整个数据集。 所以它看起来是这样的:

ID  UTCTime  nightID
1   27/04/2021 18:00    1
2   27/04/2021 21:05    1
3   28/04/2021 01:50    1
4   28/04/2021 02:25    1
5   28/04/2021 02:30    1
6   28/04/2021 05:59    1
7   28/04/2021 15:00    0
8   28/04/2021 18:00    2
9   28/04/2021 19:40    2
10  28/04/2021 23:50    2
11  29/04/2021 00:05    2
12  29/04/2021 00:35    2
13  29/04/2021 01:55    2
14  29/04/2021 02:30    2
15  29/04/2021 05:59    2
16  29/04/2021 06:01    0
17  29/04/2021 18:01    3
18  29/04/2021 22:30    3
19  29/04/2021 22:35    3

你知道如何做到这一点吗

非常感谢您的帮助

你好,玛蒂恩


Tags: 数据importiddataframepandasdf列表间隔
2条回答

让我们试试这样的方法:

# Ensure UTCTime is DateTime
df['UTCTime'] = pd.to_datetime(df['UTCTime'])

# Mask To get Times We're interested in
m = (18 <= df['UTCTime'].dt.hour) | (df['UTCTime'].dt.hour < 6)
# Shift Values Between Start and End Time so that they appear on the same date
# (Eg. 5 am becomes 11 pm the previous day)
df['OffsetTime'] = df['UTCTime'] - pd.Timedelta(hours=6)
# Create Group For Each Day
df['NightId'] = np.where(
    m, df['OffsetTime'].dt.date.astype('category').cat.codes + 1, 0
)

print(df)

资料来源:

   ID            UTCTime
3   4   2021-04-03 18:00
4   5   2021-04-03 21:05
5   6   2021-04-04 05:59
6   7   2021-04-04 15:00
7   8   2021-04-04 18:00 
8   9   2021-04-04 19:40

输出:

   ID             UTCTime          OffsetTime  NightId
3   4 2021-04-03 18:00:00 2021-04-03 12:00:00        1
4   5 2021-04-03 21:05:00 2021-04-03 15:05:00        1
5   6 2021-04-04 05:59:00 2021-04-03 23:59:00        1
6   7 2021-04-04 15:00:00 2021-04-04 09:00:00        0
7   8 2021-04-04 18:00:00 2021-04-04 12:00:00        2
8   9 2021-04-04 19:40:00 2021-04-04 13:40:00        2

在没有额外变量和列的情况下

df['NightId'] = np.where(
    (18 <= df['UTCTime'].dt.hour) | (df['UTCTime'].dt.hour < 6),
    (df['UTCTime'] - pd.Timedelta(hours=6)).dt.date.astype('category').cat.codes + 1,
    0
)
   ID             UTCTime  NightId
3   4 2021-04-03 18:00:00        1
4   5 2021-04-03 21:05:00        1
5   6 2021-04-04 05:59:00        1
6   7 2021-04-04 15:00:00        0
7   8 2021-04-04 18:00:00        2
8   9 2021-04-04 19:40:00        2

使用shift查找日期更改,并使用cumsum创建nightID

import pandas as pd
df = pd.DataFrame({'ID': range(8), 'UTCTime': pd.to_datetime(['2021-04-03 14:56:53',
                                                              '2021-04-03 18:00:00',
                                                              '2021-04-03 18:56:53',
                                                              '2021-04-04 05:03:24',
                                                              '2021-04-04 06:00:00',
                                                              '2021-04-04 06:10:09',
                                                              '2021-04-04 17:10:34',
                                                              '2021-04-04 18:20:03'])})

time_shifted = df['UTCTime'] + datetime.timedelta(hours = 6)
is_night = (time_shifted.dt.hour < 12)
night_id = (time_shifted.dt.date != time_shifted.dt.date.shift(1)) & (time_shifted.dt.date.shift(1).notna())
night_id = night_id.cumsum()
df['nightID'] = 0
df.loc[is_night, 'nightID'] = night_id
    ID  UTCTime             nightID
0   0   2021-04-03 14:56:53 0
1   1   2021-04-03 18:00:00 1
2   2   2021-04-03 18:56:53 1
3   3   2021-04-04 05:03:24 1
4   4   2021-04-04 06:00:00 0
5   5   2021-04-04 06:10:09 0
6   6   2021-04-04 17:10:34 0
7   7   2021-04-04 18:20:03 2

相关问题 更多 >