如何合并两个数据帧并仅返回匹配的行/索引

2024-05-16 18:49:25 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个数据帧,admissionsscholarship,具有以下列-studentidagecontinentsexgpa

我想做以下工作:

(I)将sexgpa表中的admissions值附加到scholarship表中的相应行,从而仅返回匹配行(索引号)

(II)必要时删除或避免结果列中的xy标记age_xcontinent_x

admissions 

 studentid, age,continent,sex,gpa
1, 10910, 18, Asia, F, 3.9 
2, 10014, 19, Europe, M, 3.8
3, 10121, 18, Africa, M, 3.9
4, 10333, 20, Asia, F, 4.0
5, 10215, 19, America, F, 3.8

scholarship

studentid, age,continent
1, 10910, 18, Asia
3, 10121, 18, Africa
4, 10333, 20, Asia

预期结果

scholarship_admissions
# take note of the index number
 studentid, age,continent,sex,gpa
1, 10910, 18, Asia, F, 3.9 
3, 10121, 18, Africa, M, 3.9
4, 10333, 20, Asia, F, 4.0

我的尝试

Import pandas as pd

scholarship_admissions = pd.merge(admissions,scholarship,on = ‘studentid’,how = ‘inner’)

多谢各位


Tags: 数据标记ageiipdscholarshipgpaeurope
1条回答
网友
1楼 · 发布于 2024-05-16 18:49:25

您的尝试看起来已经不错了,您可以提供一个列表作为合并键,以避免重复列

scholarship_admissions = admissions.merge(scholarship, left_on=["studentid", "age", "continent"], right_on=["studentid", "age", "continent"], how='inner')

输出:

   studentid  age continent sex  gpa
0      10910   18      Asia   F  3.9
1      10121   18    Africa   M  3.9
2      10333   20      Asia   F  4.0

为了保留原始索引,可以将其添加为列,并在合并后重新应用

scholarship_admissions_with_index = admissions.reset_index().merge(scholarship, left_on=["studentid", "age", "continent"], right_on=["studentid", "age", "continent"], how='inner').set_index('index')

输出:

       studentid  age continent sex  gpa
index                                   
0          10910   18      Asia   F  3.9
2          10121   18    Africa   M  3.9
3          10333   20      Asia   F  4.0

在这种特定情况下,您可以只使用on,而不是指定left_onright_on,因为两个数据帧中的列具有相同的名称

admissions.merge(scholarship, on=["studentid", "age", "continent"], how='inner')

相关问题 更多 >