Python/Pandas Binning数据定时

2024-04-19 07:22:11 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个有两列的数据帧

    userID     duration
0   DSm7ysk    03:08:49
1   no51CdJ    00:35:50
2   ...

类型为timedelta的“duration”。我试过用

^{pr2}$

但是,装箱的数据不使用指定的存储箱,而是在帧中的每个持续时间内创建。在

将timedelta对象放入不规则容器的最简单方法是什么?还是我错过了一些显而易见的东西?在


Tags: 数据对象方法类型容器timedelta持续时间duration
2条回答

对我来说,熊猫0.23.4很管用

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'userID': ['DSm7ysk', 'no51CdJ', 'foo', 'bar'],
    'duration': [pd.Timedelta('3 hours 8 minutes 49 seconds'), pd.Timedelta('35 minutes 50 seconds'), pd.Timedelta('1 minutes 13 seconds'), pd.Timedelta('6 minutes 43 seconds')]
})

bins = [
    pd.Timedelta(minutes = 0),
    pd.Timedelta(minutes = 5),
    pd.Timedelta(minutes = 10),
    pd.Timedelta(minutes = 20),
    pd.Timedelta(minutes = 30),
    pd.Timedelta(hours = 4)
]

labels = ['0-5min', '5-10min', '10-20min', '20-30min', '30min+']

df['bins'] = pd.cut(df['duration'], bins, labels = labels)

结果:

result

可以在binning之前将其规范化为秒。这就减少了将整数分块的问题。在

df = pd.DataFrame({'userID': ['A', 'B'],
                   'duration': pd.to_timedelta(['00:08:49', '00:35:50'])})

L = ['00:00:00', '00:05:00', '00:10:00', '00:20:00', '00:30:00', '04:00:00']

bins = pd.to_timedelta(L).total_seconds()
cats = ['0-5min', '5-10min', '10-20min', '20-30min', '30min+']

df['bins'] = pd.cut(df['duration'].dt.total_seconds(), bins, labels=cats)

print(df)

#    duration userID     bins
# 0  00:08:49      A  5-10min
# 1  00:35:50      B   30min+

相关问题 更多 >