如何根据时间间隔合并两个数据帧并进行转换

2024-06-10 08:38:25 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个数据帧,第一个是由用户手动创建的,第二个是来自机器的错误。 我想根据第一个数据帧中的时间间隔(df\u a)来合并它们

以下是数据帧

d_a = {'Station' : ['A1','A2'],
       'Reason_a' : ['Electronic','Feed'],
       'StartTime_a' : ['2019-01-02 02:00:00','2019-01-02 04:22:00'],
       'EndTime_a' : ['2019-01-02 02:20:00', '2019-01-02 04:45:00']}

d_b = {'Station' : ['A1','A1','A1','A2','A2','A2'],
       'Reason_b' : ['a','n','c','d','e','n'],
       'StartTime_b' : ['2019-01-02 00:00:00.000','2019-01-02 00:05:00.000','2019-01-01 23:55:00.000','2019-01-02 04:19:53.000','2019-01-02 04:19:37.000','2019-01-02 04:23:00.000'],
       'EndTime_b' : ['2019-01-02 00:19:15.000','2019-01-02 00:29:45.000','2019-01-02 00:12:12.000','2019-01-02 04:27:12.000','2019-01-02 04:47:16.000','2019-01-02 04:52:45.000']}

df_a = pd.DataFrame(d_a)
df_b = pd.DataFrame(d_b)

视为有效记录的两个数据帧的时间间隔的任何交点。你知道吗

条件1=df泳b开始时间在df泳a开始时间之后开始,在df泳a结束时间之前结束

condition2=df\ b开始时间在df\ a开始时间之前开始,但在df\ a结束时间之前结束

condition3=df\u b start\u times在df\u a starttime和df\u a end time之间开始,但在df\u a endtime之后结束

最后,我想根据条件合并这两个数据帧。我理想的桌子如下所示

 Station     Reason_a        a     n     c     d    e 
  A1         Electronic      1     1     1     0    0
  A2          Feed           0     1     0     1    0

我应该如何处理这个问题? 任何评论都会有帮助。你知道吗

提前谢谢。你知道吗


Tags: 数据a2dataframedf间隔a1feed时间
3条回答

我想到了这个:

    df_c = pd.merge(df_a,df_b, left_on = 'Station', right_on = 'Station')

生成日期时间:

    df_c['StartTime_a'] = pd.to_datetime(df_c['StartTime_a'])
    df_c['StartTime_b'] = pd.to_datetime(df_c['StartTime_b'])
    df_c['EndTime_a'] = pd.to_datetime(df_c['EndTime_a'])
    df_c['EndTime_b'] = pd.to_datetime(df_c['EndTime_b'])

应用lambda函数:

    df_c['c'] = df_c.apply(lambda x : 1 if (x.StartTime_b > x.StartTime_a) and (x.EndTime_b < x.EndTime_a) 
                   else (1 if  (x.StartTime_b < x.StartTime_a) and (x.EndTime_b < x.EndTime_a) 
                   else (1 if ((x.StartTime_b > x.StartTime_a) and (x.StartTime_b < x.EndTime_a)) and (x.EndTime_b > x.EndTime_a) else 0)), axis=1)

其次是:

    df_d = df_c.groupby(['Station','Reason_a','Reason_b'])['c'].sum().unstack()
    df_d.fillna(0, inplace=True)

我将通过合并station上的表并计算交点来解决这个问题:D

import numpy as np

df = pd.merge(df_a, df_b, on="Station")

# Convert to date
for datevar in ["StartTime_a", "StartTime_b", "EndTime_a", "EndTime_b"]:
    df[datevar] = pd.to_datetime(df[datevar])

# Intersections definition
df["intersection"] = (((df.StartTime_a > df.StartTime_b) & (df.StartTime_a < df.EndTime_b)) |
                      ((df.StartTime_a < df.StartTime_b) & (df.EndTime_a > df.StartTime_b)))

# Filter only intersections
(df[["Station", "Reason_a", "Reason_b", "intersection"]]
.pivot_table(index=["Station", "Reason_a"], columns="Reason_b", aggfunc=np.sum)
.fillna(0).astype(int)) 

可以使用pandasmerge_asof执行这些类型的合并。你知道吗

假设“Station”是合并过程的一个附加键,您可以使用以下内容:

df_a['StartTime_a'] = pd.to_datetime(df_a['StartTime_a'])
df_b['StartTime_b'] = pd.to_datetime(df_b['StartTime_b'])
df_a['EndTime_a'] = pd.to_datetime(df_a['EndTime_a'])
df_b['EndTime_b'] = pd.to_datetime(df_b['EndTime_b'])
##before using merge_asof sorting is needed
df_a.sort_values(by='StartTime_a', inplace=True)
df_b.sort_values(by='StartTime_b', inplace=True)
##merge and filter based on first condition
cond_1 = pd.merge_asof(df_a, df_b, by='Station', left_on='StartTime_a', 
right_on='StartTime_b', direction='forward')
cond_1 = cond_1[cond_1['StartTime_b'] <= cond_1['EndTime_a']]
##merge and filter based on second condition
cond_2 = pd.merge_asof(df_a, df_b, by='Station', left_on='StartTime_a', 
right_on='StartTime_b', direction='backward')
cond_2 = cond_2[cond_2['EndTime_b'] <= cond_2['EndTime_a']]
##merge and filter based on third condition
cond_3 = pd.merge_asof(df_a, df_b, by='Station', left_on='StartTime_a', 
right_on='StartTime_b', direction='forward')
cond_3 = cond_3[cond_3['StartTime_b'] <= cond_3['EndTime_a']]
cond_3 = cond_3[cond_3['EndTime_b'] >= cond_3['EndTime_a']]
##concatenating all matches
res_df = pd.concat([cond_1, cond_2, cond_3], sort=False)

相关问题 更多 >