在Python中,基于多个条件将一列从多个数据帧合并到另一个数据帧

2024-06-16 14:30:22 发布

您现在位置:Python中文网/ 问答频道 /正文

假设我有一个名为df的组合数据帧,如下所示。每一行都有建筑信息和匹配的建筑信息。我希望合并来自df1、df2和df3的每个建筑的id(见下文)。df_nummatched_df_num的列用来区分建筑信息来自哪个数据帧,如果它等于1,则表示它来自df1,2表示来自df2,3表示来自df3。在

   df_num       city                             name  matched_df_num  \
0      1   Shenzhen                      Kingkey 100              2   
1      2   Shenzhen           Ping An Finance Centre              2   
2      2   Shenzhen           Ping An Finance Centre              3   
3      2  Guangzhou     Guangzhou CTF Finance Centre              3   
4      3   Shanghai  Shanghai World Financial Center              3   

  matched_city                     matched_name  similarity_ratio  
0     Shenzhen           Ping An Finance Centre                51  
1    Guangzhou     Guangzhou CTF Finance Centre                66  
2     Shanghai  Shanghai World Financial Center                59  
3     Shanghai  Shanghai World Financial Center                56  
4     Changsha            Changsha IFS Tower T1                57  

我想合并来自df1df2df3id的列,以获得建筑名称和匹配名称:

^{pr2}$

这是我预期的结果:

   df_num       city                             name          id  \
0      1   Shenzhen                      Kingkey 100  1010667356   
1      2   Shenzhen           Ping An Finance Centre      190010   
2      2   Shenzhen           Ping An Finance Centre      190010   
3      2  Guangzhou     Guangzhou CTF Finance Centre      190012   
4      3   Shanghai  Shanghai World Financial Center       ZY-13   

   matched_df_num  matched_city                     matched_name  \
0              2     Shenzhen           Ping An Finance Centre   
1              2    Guangzhou     Guangzhou CTF Finance Centre   
2              3     Shanghai  Shanghai World Financial Center   
3              3     Shanghai  Shanghai World Financial Center   
4              3     Changsha            Changsha IFS Tower T1   

   similarity_ratio matched_id  
0                51     190010  
1                66     190010  
2                59      ZY-13  
3                56      ZY-13  
4                57      ZY-16  

如何使用Pandas插入两个新列id和{}及其值?提前感谢你的帮助。在

更新:我的解决方案:

df = df.merge(df1, on = ['city', 'name'], how = 'left').merge(df2, on = ['city', 'name'], how = 'left').merge(df3, on = ['city', 'name'], how = 'left')
final_df = df.merge(df1, left_on = ['matched_city', 'matched_name'], right_on = ['city', 'name'], how = 'left').merge(df2, left_on = ['matched_city', 'matched_name'], right_on = ['city', 'name'], how = 'left').merge(df3, left_on = ['matched_city', 'matched_name'], right_on = ['city', 'name'], how = 'left')

   df_num     city_x                           name_x  matched_df_num  \
0       1   Shenzhen                      Kingkey 100               2   
1       2   Shenzhen           Ping An Finance Centre               2   
2       2   Shenzhen           Ping An Finance Centre               3   
3       2  Guangzhou     Guangzhou CTF Finance Centre               3   
4       3   Shanghai  Shanghai World Financial Center               3   

  matched_city                     matched_name  similarity_ratio        id_x  \
0     Shenzhen           Ping An Finance Centre                51  1010667356   
1    Guangzhou     Guangzhou CTF Finance Centre                66         NaN   
2     Shanghai  Shanghai World Financial Center                59         NaN   
3     Shanghai  Shanghai World Financial Center                56         NaN   
4     Changsha            Changsha IFS Tower T1                57         NaN   

     id_y   id_x id_y city_y name_y    id_x     city_x  \
0     NaN    NaN  NaN    NaN    NaN  190010   Shenzhen   
1  190010    NaN  NaN    NaN    NaN  190012  Guangzhou   
2  190010    NaN  NaN    NaN    NaN     NaN        NaN   
3  190012    NaN  NaN    NaN    NaN     NaN        NaN   
4     NaN  ZY-13  NaN    NaN    NaN     NaN        NaN   

                         name_x   id_y    city_y  \
0        Ping An Finance Centre    NaN       NaN   
1  Guangzhou CTF Finance Centre    NaN       NaN   
2                           NaN  ZY-13  Shanghai   
3                           NaN  ZY-13  Shanghai   
4                           NaN  ZY-16  Changsha   

                            name_y  
0                              NaN  
1                              NaN  
2  Shanghai World Financial Center  
3  Shanghai World Financial Center  
4            Changsha IFS Tower T1  

Tags: nameidcitydfworldnanpingcenter
2条回答

您可以将^{}merge和左连接一起使用:

dff = pd.concat([df1, df2, df3])
print (dff)
           id       city                             name
0  1010667747     Suzhou                       Suzhou IFS
1  1010667356   Shenzhen                      Kingkey 100
2  1010667289      Wuhan                     Wuhan Center
0      190010   Shenzhen           Ping An Finance Centre
1      190012  Guangzhou     Guangzhou CTF Finance Centre
2      190015    Beijing                        China Zun
0       ZY-13   Shanghai  Shanghai World Financial Center
1       ZY-15  Hong Kong    International Commerce Centre
2       ZY-16   Changsha            Changsha IFS Tower T1

df = df.merge(dff,on = ['city', 'name'], how = 'left')
print (df)
   df_num       city                             name  matched_df_num  \
0       1   Shenzhen                      Kingkey 100               2   
1       2   Shenzhen           Ping An Finance Centre               2   
2       2   Shenzhen           Ping An Finance Centre               3   
3       2  Guangzhou     Guangzhou CTF Finance Centre               3   
4       3   Shanghai  Shanghai World Financial Center               3   

  matched_city                     matched_name  similarity_ratio          id  
0     Shenzhen           Ping An Finance Centre                51  1010667356  
1    Guangzhou     Guangzhou CTF Finance Centre                66      190010  
2     Shanghai  Shanghai World Financial Center                59      190010  
3     Shanghai  Shanghai World Financial Center                56      190012  
4     Changsha            Changsha IFS Tower T1                57       ZY-13

然后再次合并,为了避免重复的列,请使用rename

^{pr2}$

编辑:您可以先通过^{}向每个数据帧添加新值,然后按此列合并:

dff = pd.concat([df1.assign(df_num=1), df2.assign(df_num=2), df3.assign(df_num=3)])
df = df.merge(dff,on = ['city', 'name','df_num'], how = 'left')

d = {'city':'matched_city','name':'matched_name', 'id':'matched_id','df_num':'matched_df_num'}
df5 = (df.merge(dff.rename(columns=d), 
                on = ['matched_city', 'matched_name','matched_df_num'], 
                how = 'left'))
print (df5)
   df_num       city                             name  matched_df_num  \
0       1   Shenzhen                      Kingkey 100               2   
1       2   Shenzhen           Ping An Finance Centre               2   
2       2   Shenzhen           Ping An Finance Centre               3   
3       2  Guangzhou     Guangzhou CTF Finance Centre               3   
4       3   Shanghai  Shanghai World Financial Center               3   

  matched_city                     matched_name  similarity_ratio          id  \
0     Shenzhen           Ping An Finance Centre                51  1010667356   
1    Guangzhou     Guangzhou CTF Finance Centre                66      190010   
2     Shanghai  Shanghai World Financial Center                59      190010   
3     Shanghai  Shanghai World Financial Center                56      190012   
4     Changsha            Changsha IFS Tower T1                57       ZY-13   

  matched_id  
0     190010  
1     190012  
2      ZY-13  
3      ZY-13  
4      ZY-16  

尝试一下,它可能会帮助您解决问题

    df1 = pd.DataFrame(np.array([
[1010667747, 'Suzhou', 'Suzhou IFS'],
[1010667356, 'Shenzhen', 'Kingkey 100'],
[1010667289, 'Wuhan', 'Wuhan Center']]),
columns=['id', 'city', 'name']
)
df2 = pd.DataFrame(np.array([
    [190010, 'Shenzhen', 'Ping An Finance Centre'],
    [190012, 'Guangzhou', 'Guangzhou CTF Finance Centre'],
    [190015, 'Beijing', 'China Zun']]),
    columns=['id', 'city', 'name']
)
df3 = pd.DataFrame(np.array([
    ['ZY-13', 'Shanghai', 'Shanghai World Financial Center'],
    ['ZY-15', 'Hong Kong', 'International Commerce Centre'],
    ['ZY-16', 'Changsha', 'Changsha IFS Tower T1']]),
    columns=['id', 'city', 'name']
)

df1['df_type'] = 1
df2['df_type'] = 2
df3['df_type'] = 3

df = pd.concat([df1,df2,df3])

df

相关问题 更多 >