如何让dayhour依赖于datfram

2024-04-27 00:40:56 发布

您现在位置:Python中文网/ 问答频道 /正文

我做了多天的观察,一个客户可以在几天内观察到,这是我的数据

customer_id   value    timestamp
1             1000     2018-05-28 03:40:00.000
1             1450     2018-05-28 04:40:01.000
1             1040     2018-05-28 05:40:00.000
1             1500     2018-05-29 02:40:00.000
1             1090     2018-05-29 04:40:00.000
3             1060     2018-05-18 03:40:00.000
3             1040     2018-05-18 05:40:00.000
3             1520     2018-05-19 03:40:00.000
3             1490     2018-05-19 04:40:00.000

根据前面的问题How do I building dt.hour in 2 days,出现的第一个客户是2018-05-28 03:40:00.000,标记为Day1 - 3,但出于另一个目的,应该是Day1 - 0,因此输出是

customer_id   value    timestamp                hour
1             1000     2018-05-28 03:40:00.000  Day1 - 0
1             1450     2018-05-28 04:40:01.000  Day1 - 1
1             1040     2018-05-28 05:40:00.000  Day1 - 2
1             1500     2018-05-29 02:40:00.000  Day1 - 23
1             1090     2018-05-29 04:40:00.000  Day2 - 1
3             1060     2018-05-18 03:40:00.000  Day1 - 0
3             1040     2018-05-18 05:40:00.000  Day1 - 2
3             1520     2018-05-19 03:40:00.000  Day2 - 0
3             1490     2018-05-19 04:40:00.000  Day2 - 1

Tags: 数据inid客户valuedtcustomerdays
1条回答
网友
1楼 · 发布于 2024-04-27 00:40:56

我认为需要为正确的^{}添加所有错误的时间:

#floor to hours
df['timestamp'] = df['timestamp'].dt.floor('h')
#add missing hours per group
df = df.set_index('timestamp').groupby('customer_id').apply(lambda x: x.asfreq('h'))
#cumulative count per group
df['hour'] = df.groupby(level=0).cumcount() 
df= df.dropna(subset=['customer_id']).drop('customer_id', 1).reset_index()

df['hour'] = ('Day' + (df['hour'] // 24).add(1).astype(str) +
              ' - ' + (df['hour'] % 24).astype(str))
print (df) 
   customer_id           timestamp   value       hour
0            1 2018-05-28 03:00:00  1000.0   Day1 - 0
1            1 2018-05-28 04:00:00  1450.0   Day1 - 1
2            1 2018-05-28 05:00:00  1040.0   Day1 - 2
3            1 2018-05-29 02:00:00  1500.0  Day1 - 23
4            1 2018-05-29 04:00:00  1090.0   Day2 - 1
5            3 2018-05-18 03:00:00  1060.0   Day1 - 0
6            3 2018-05-18 05:00:00  1040.0   Day1 - 2
7            3 2018-05-19 03:00:00  1520.0   Day2 - 0
8            3 2018-05-19 04:00:00  1490.0   Day2 - 1

相关问题 更多 >