如何在一个数据帧中包含超过一个月的两个时间戳之间分割差异

df = pd.DataFrame({ 'id_kanban': [244, 243, 8, 9, 29], 'component': ['A', 'A', 'B', 'B', 'C'], 'start': ['2021-01-23 11:51:39', '2021-01-28 08:11:48', '2021-01-30 06:32:32', '2021-02-01 11:21:39', '2021-02-02 17:03:18'], 'end': ['2021-02-11 10:20:21', '2021-01-28 09:13:42', '2021-02-02 08:14:24', '2021-04-04 09:22:19', '2021-02-03 13:05:28'] })

id_kanban component start end 0 244 A 2021-01-23 11:51:39 2021-02-11 10:20:21 1 243 A 2021-01-28 08:11:48 2021-01-28 09:13:42 2 8 B 2021-01-30 06:32:32 2021-02-02 08:14:24 3 9 B 2021-02-01 11:21:39 2021-04-04 09:22:19 4 29 C 2021-02-02 17:03:18 2021-02-03 13:05:28

id_kanban component start end 0 244 A 2021-01-23 11:51:39 2021-02-01 00:00:00 1 244 A 2021-02-01 00:00:00 2021-02-11 10:20:21 2 243 A 2021-01-28 08:11:48 2021-01-28 09:13:42 3 8 B 2021-01-30 06:32:32 2021-02-01 00:00:00 4 8 B 2021-02-01 00:00:00 2021-02-02 08:14:24 5 9 B 2021-02-01 11:21:39 2021-03-01 00:00:00 6 9 B 2021-03-01 00:00:00 2021-04-01 00:00:00 7 9 B 2021-04-01 00:00:00 2021-04-04 09:22:19 8 29 C 2021-02-02 17:03:18 2021-02-03 13:05:28

2条回答

网友

1楼 · 编辑于 2024-06-08 23:29:15

您可以向名为“一月到十二月”或（1到12）的数据框中添加列，并使用开始时间和结束时间计算时间增量

首先将datetime列转换为datetime对象：

df['start'] = pd.to_datetime(df.start)
df['end'] = pd.to_datetime(df.end)

假设您正在计算特定年份（2021年）的旅行时间

year = 2021
for i in range(1,13):
    beg_date = pd.Timestamp(year = year, month = i, day = 1)
    if i != 12:
        end_date =  pd.Timestamp(year = year, month = i+1, day = 1)
    else:
        end_date =  pd.Timestamp(year = year+1, month = 1, day = 1)
    
    df[f'Travel_time(h)_{i}'] = 0 #start value
    
    #case 1: start and end are in the same month
    idx1 = df.end < end_date
    idx2 = df.start > beg_date
    delta = (df.end - df.start).apply(lambda x: x.total_seconds())/3600 #in hours
    df.loc[idx1 & idx2, f'Travel_time(h)_{i}'] = delta.loc[idx1 & idx2]
    
    #case 2: end date is outside but start is in the month
    idx1 = df.end > end_date
    idx2 = (df.start >= beg_date) & (df.start < end_date)
    delta = (end_date - df.start).apply(lambda x: x.total_seconds())/3600 #in hours
    df.loc[idx1 & idx2, f'Travel_time(h)_{i}'] = delta.loc[idx1 & idx2]
    
    #case 3: start date is outside but end date is inside the month
    idx1 = df.start < beg_date
    idx2 = (df.end >= beg_date) & (df.end < end_date)
    delta = (df.end - beg_date).apply(lambda x: x.total_seconds())/3600 #in hours
    df.loc[idx1 & idx2, f'Travel_time(h)_{i}'] = delta.loc[idx1 & idx2]
    
    #case 4: the month is fully in
    idx1 = df.start < beg_date
    idx2 = df.end > end_date
    delta = (end_date - beg_date).total_seconds()/3600
    df.loc[idx1 & idx2, f'Travel_time(h)_{i}'] = delta

输出：

df[df.columns[4:8]]
Out[160]: 
   Travel_time(h)_1  Travel_time(h)_2  Travel_time(h)_3  Travel_time(h)_4
0        204.139167        250.339167               0.0          0.000000
1          1.031667          0.000000               0.0          0.000000
2         41.457778         32.240000               0.0          0.000000
3          0.000000        660.639167             744.0         81.371944
4          0.000000         20.036111               0.0          0.000000

网友

2楼 · 编辑于 2024-06-08 23:29:15

您可以使用具有月频率的date_range构建新的DateTimeIndex，然后返回间隔：

def find_interval(sr):
    dti = pd.date_range(sr['start'], sr['end'], freq='M').normalize() \
              + pd.Timedelta(days=1)
    return list(zip([sr['start']] + dti.tolist(), dti.tolist() + [sr['end']]))

df1 = df.apply(find_interval, axis=1).explode().apply(pd.Series)
df1 = df.drop(columns=['start', 'end']) \
        .join(df1).rename(columns={0: 'start', 1: 'end'})

输出

>>> df1
   id_kanban component               start                 end
0        244         A 2021-01-23 11:51:39 2021-02-01 00:00:00
0        244         A 2021-02-01 00:00:00 2021-02-11 10:20:21
1        243         A 2021-01-28 08:11:48 2021-01-28 09:13:42
2          8         B 2021-01-30 06:32:32 2021-02-01 00:00:00
2          8         B 2021-02-01 00:00:00 2021-02-02 08:14:24
3          9         B 2021-02-01 11:21:39 2021-03-01 00:00:00
3          9         B 2021-03-01 00:00:00 2021-04-01 00:00:00
3          9         B 2021-04-01 00:00:00 2021-04-04 09:22:19
4         29         C 2021-02-02 17:03:18 2021-02-03 13:05:28

相关问题更多 >

编程相关推荐

热门问题

热门文章