Python通过在使用Python的2个数据帧中具有条件(范围)来合并键

2024-04-25 09:02:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我想合并(使用how='left'),而dataframe\u A在左边,dataframe\u B在右边。要连接的列/索引级别名称是“name”、“weight”和“money”。身高和体重的差异允许达到2厘米。
我不使用for循环,因为我的数据集太大,需要2天才能完成


例如

INPUT

dataframe_A name:John, height: 170, weight :70
dataframe_B name:John, height 172, weight :69

输出

output_dataframe : name:John,height: 170, weight :70, money:100, grade :1

我有两个数据帧:

dataframe_A = pd.DataFrame({'name': ['John', 'May', 'Jane', 'Sally'],
                            'height': [170, 180, 160, 155],
                            'weight': [70, 88, 60, 65],
                            'money': [100, 1120, 2000, 3000]})

dataframe_B = pd.DataFrame({'name': ['John', 'May', 'Jane', 'Sally'],
                            'height': [172, 180, 160, 155],
                            'weight': [69, 88, 60, 65],
                            'grade': [1, 2, 3, 4]})

在选择站位时应注意:

    SELECT * FROM dataframe_A LEFT JOIN dataframe_B 
ON dataframe_A.name= dataframe_B.name and 
dataframe_A.height => dataframe_B.height+2 or
dataframe_A.height <= dataframe_B.height-2 and
dataframe_A.weight=> dataframe_B.weight+2 or
dataframe_A.weight<= dataframe_B.weight-2 
;

但是我不确定如何将它放到python中,因为我还在学习

output_dataframe =pd.merge(dataframe_A,dataframe_B,how='left',on=['name','height','weight'] + ***the range condition***

Tags: 数据namedataframeoutputjohnleftmayhow
1条回答
网友
1楼 · 发布于 2024-04-25 09:02:17

首先使用^{},然后使用^{}^{}过滤:

df = pd.merge(dataframe_A, dataframe_B, on='name', how='left', suffixes=('','_'))
m1 = df['height'].between(df['height_'] - 2, df['height_'] + 2)
m2 = df['weight'].between(df['weight_'] - 2, df['weight_'] + 2)

df = df.loc[m1 & m2, dataframe_A.columns.tolist() + ['grade']]
print (df)
    name  height  weight  money  grade
0   John     170      70    100      1
1    May     180      88   1120      2
2   Jane     160      60   2000      3
3  Sally     155      65   3000      4

相关问题 更多 >