从时间戳创建新的calssification列

Event_Code Timestamp 2053 13/08/2016 11:30 1029 10/09/2016 14:00 2053 02/10/2016 13:15 2053 06/11/2016 16:30 2053 19/11/2016 15:00 2053 03/12/2016 17:30 1029 02/01/2017 15:00 1029 05/02/2017 16:00 2053 11/02/2017 15:00 1029 04/03/2017 15:00 2053 01/04/2017 14:00 1029 21/05/2017 14:00

def label_stage(row): if row['Timestamp'] > '2016-08-12' and row['Timestamp'] < '2016-11-07': return 0 if row['Timestamp'] > '2016-11-18' and row['Timestamp'] < '2017-02-06': return 1 if row['Timestamp'] > '2017-02-10' and row['Timestamp'] < '2017-05-22': return 2 df['Stages_So'] = df.apply(lambda row: label_stage(row), axis=1)

1条回答

网友

1楼 · 发布于 2024-04-18 18:24:11

您需要首先通过^{}将列转换为日期时间，然后通过datetime进行比较：

df['Timestamp'] = pd.to_datetime(df['Timestamp'])

def label_stage(row):
    if row['Timestamp'] > pd.Timestamp('2016-08-12') and 
       row['Timestamp'] < pd.Timestamp('2016-11-07'):
        return 0
    if row['Timestamp'] > pd.Timestamp('2016-11-18') and 
       row['Timestamp'] < pd.Timestamp('2017-02-06'):
        return 1
    if row['Timestamp'] > pd.Timestamp('2017-02-10') and 
       row['Timestamp'] < pd.Timestamp('2017-05-22'):
        return 2

df['Stages_So'] = df.apply(lambda row: label_stage(row), axis=1)
print (df)
    Event_Code           Timestamp  Stages_So
0         2053 2016-08-13 11:30:00        0.0
1         1029 2016-10-09 14:00:00        0.0
2         2053 2016-02-10 13:15:00        NaN
3         2053 2016-06-11 16:30:00        NaN
4         2053 2016-11-19 15:00:00        1.0
5         2053 2016-03-12 17:30:00        NaN
6         1029 2017-02-01 15:00:00        1.0
7         1029 2017-05-02 16:00:00        2.0
8         2053 2017-11-02 15:00:00        NaN
9         1029 2017-04-03 15:00:00        2.0
10        2053 2017-01-04 14:00:00        1.0
11        1029 2017-05-21 14:00:00        2.0

另一个更快的解决方案：

df['Timestamp'] = pd.to_datetime(df['Timestamp'])

m1 = (df['Timestamp'] > '2016-08-12') & (df['Timestamp'] < '2016-11-07')
m2 = (df['Timestamp'] > '2016-11-18') & (df['Timestamp'] < '2017-02-06')
m3 = (df['Timestamp'] > '2017-02-10') & (df['Timestamp'] < '2017-05-22')

df['Stages_So'] = np.select([m1, m2, m3], [0,1,2], default=np.nan)
print (df)
    Event_Code           Timestamp  Stages_So
0         2053 2016-08-13 11:30:00        0.0
1         1029 2016-10-09 14:00:00        0.0
2         2053 2016-02-10 13:15:00        NaN
3         2053 2016-06-11 16:30:00        NaN
4         2053 2016-11-19 15:00:00        1.0
5         2053 2016-03-12 17:30:00        NaN
6         1029 2017-02-01 15:00:00        1.0
7         1029 2017-05-02 16:00:00        2.0
8         2053 2017-11-02 15:00:00        NaN
9         1029 2017-04-03 15:00:00        2.0
10        2053 2017-01-04 14:00:00        1.0
11        1029 2017-05-21 14:00:00        2.0

相关问题更多 >

编程相关推荐

热门问题

热门文章