按组按日期之间的最小绝对差值选择行

2024-04-27 16:35:17 发布

您现在位置:Python中文网/ 问答频道 /正文

            A           B      C
0  2002-01-16  2002-02-28   Jack
1  2002-01-16  2002-01-30  Helen
2  2002-01-16  2002-02-28  Peter
3  2002-01-16  2002-01-30    Jud
4  2002-04-27  2002-04-30   Nick
5  2002-04-27  2002-05-25  Wendy
6  2002-04-27  2002-04-30  Bryan
7  2002-04-27  2002-05-25  Sarah

我想选择行,每个A组的A日期在时间上更接近B日期

输出应为:

            A           B      C
1  2002-01-16  2002-01-30  Helen
3  2002-01-16  2002-01-30    Jud
4  2002-04-27  2002-04-30   Nick
6  2002-04-27  2002-04-30  Bryan

Tags: 时间bryannickpeterhelenjackjudsarah
2条回答

这是一种方法

# convert columns to datetime
df[['A', 'B']] = df[['A', 'B']].apply(pd.to_datetime)

# calculate absolute difference
df['Diff'] = (df['B'] - df['A']).abs()

# filter for difference equal to mapped minimum
res = df.loc[df['Diff'] == df['A'].map(df.groupby('A')['Diff'].min())]

结果:

           A          B      C    Diff
1 2002-01-16 2002-01-30  Helen 14 days
3 2002-01-16 2002-01-30    Jud 14 days
4 2002-04-27 2002-04-30   Nick  3 days
6 2002-04-27 2002-04-30  Bryan  3 days

用途:

df = df[df['B'].sub(df['A']).groupby(df['A']).transform(lambda x: x == x.min())]
print (df)
           A          B      C
1 2002-01-16 2002-01-30  Helen
3 2002-01-16 2002-01-30    Jud
4 2002-04-27 2002-04-30   Nick
6 2002-04-27 2002-04-30  Bryan

详细信息:

print (df['B'].sub(df['A']))

0   43 days
1   14 days
2   43 days
3   14 days
4    3 days
5   28 days
6    3 days
7   28 days
dtype: timedelta64[ns]

print (df['B'].sub(df['A']).groupby(df['A']).transform(lambda x: x == x.min()))
0    False
1     True
2    False
3     True
4     True
5    False
6     True
7    False
dtype: bool

相关问题 更多 >