假设我有一个名为df
的组合数据帧,如下所示。每一行都有建筑信息和匹配的建筑信息。我希望合并来自df1、df2和df3的每个建筑的id
(见下文)。df_num
或matched_df_num
的列用来区分建筑信息来自哪个数据帧,如果它等于1,则表示它来自df1
,2表示来自df2
,3表示来自df3
。在
df_num city name matched_df_num \
0 1 Shenzhen Kingkey 100 2
1 2 Shenzhen Ping An Finance Centre 2
2 2 Shenzhen Ping An Finance Centre 3
3 2 Guangzhou Guangzhou CTF Finance Centre 3
4 3 Shanghai Shanghai World Financial Center 3
matched_city matched_name similarity_ratio
0 Shenzhen Ping An Finance Centre 51
1 Guangzhou Guangzhou CTF Finance Centre 66
2 Shanghai Shanghai World Financial Center 59
3 Shanghai Shanghai World Financial Center 56
4 Changsha Changsha IFS Tower T1 57
我想合并来自df1
、df2
和df3
的id
的列,以获得建筑名称和匹配名称:
这是我预期的结果:
df_num city name id \
0 1 Shenzhen Kingkey 100 1010667356
1 2 Shenzhen Ping An Finance Centre 190010
2 2 Shenzhen Ping An Finance Centre 190010
3 2 Guangzhou Guangzhou CTF Finance Centre 190012
4 3 Shanghai Shanghai World Financial Center ZY-13
matched_df_num matched_city matched_name \
0 2 Shenzhen Ping An Finance Centre
1 2 Guangzhou Guangzhou CTF Finance Centre
2 3 Shanghai Shanghai World Financial Center
3 3 Shanghai Shanghai World Financial Center
4 3 Changsha Changsha IFS Tower T1
similarity_ratio matched_id
0 51 190010
1 66 190010
2 59 ZY-13
3 56 ZY-13
4 57 ZY-16
如何使用Pandas插入两个新列id
和{
更新:我的解决方案:
df = df.merge(df1, on = ['city', 'name'], how = 'left').merge(df2, on = ['city', 'name'], how = 'left').merge(df3, on = ['city', 'name'], how = 'left')
final_df = df.merge(df1, left_on = ['matched_city', 'matched_name'], right_on = ['city', 'name'], how = 'left').merge(df2, left_on = ['matched_city', 'matched_name'], right_on = ['city', 'name'], how = 'left').merge(df3, left_on = ['matched_city', 'matched_name'], right_on = ['city', 'name'], how = 'left')
df_num city_x name_x matched_df_num \
0 1 Shenzhen Kingkey 100 2
1 2 Shenzhen Ping An Finance Centre 2
2 2 Shenzhen Ping An Finance Centre 3
3 2 Guangzhou Guangzhou CTF Finance Centre 3
4 3 Shanghai Shanghai World Financial Center 3
matched_city matched_name similarity_ratio id_x \
0 Shenzhen Ping An Finance Centre 51 1010667356
1 Guangzhou Guangzhou CTF Finance Centre 66 NaN
2 Shanghai Shanghai World Financial Center 59 NaN
3 Shanghai Shanghai World Financial Center 56 NaN
4 Changsha Changsha IFS Tower T1 57 NaN
id_y id_x id_y city_y name_y id_x city_x \
0 NaN NaN NaN NaN NaN 190010 Shenzhen
1 190010 NaN NaN NaN NaN 190012 Guangzhou
2 190010 NaN NaN NaN NaN NaN NaN
3 190012 NaN NaN NaN NaN NaN NaN
4 NaN ZY-13 NaN NaN NaN NaN NaN
name_x id_y city_y \
0 Ping An Finance Centre NaN NaN
1 Guangzhou CTF Finance Centre NaN NaN
2 NaN ZY-13 Shanghai
3 NaN ZY-13 Shanghai
4 NaN ZY-16 Changsha
name_y
0 NaN
1 NaN
2 Shanghai World Financial Center
3 Shanghai World Financial Center
4 Changsha IFS Tower T1
您可以将^{} 与
merge
和左连接一起使用:然后再次合并,为了避免重复的列,请使用
^{pr2}$rename
:编辑:您可以先通过^{} 向每个数据帧添加新值,然后按此列合并:
尝试一下,它可能会帮助您解决问题
相关问题 更多 >
编程相关推荐