Python用另一个数据帧的值更新数据帧，而不替换现有的数据帧

import pandas as pd df = pd.DataFrame({ 'email':['1@dummy.com','2@dummy.com','3@dummy.com','4@dummy.com'], 'Name': ['John', 'Sam',None,None], 'id': ['A0', 'A1','A2', 'A3'], } ) df df Name email id 0 John 1@dummy.com A0 1 Sam 2@dummy.com A1 2 None 3@dummy.com A2 3 None 4@dummy.com A3 ref_df = pd.DataFrame({ 'email':['1@dummy.com','2@dummy.com','3@dummy.com','4@dummy.com'], 'Name': ['', 'Sam','Tim','Sara'], 'random': ['f', 's','r', 'a'], } ) ref_df Name email random 0 1@dummy.com f 1 Sam 2@dummy.com s 2 Tim 3@dummy.com r 3 Sara 4@dummy.com a

def update_df(df, index, ref_df, ref_cols,how='inner',left_on=None, right_on=None,): df = init_columns(df, cols=ref_cols) cols_to_keep = list(df.columns) gap_cols = df.columns.difference(ref_df.columns) gap_df = merge( df[gap_cols], ref_df, how, left_on, right_on, ) gap_df = gap_df[cols_to_keep].set_index(index) df = df.set_index(index) df.update(gap_df) df=df[cols_to_keep] return df

2条回答

网友

1楼 · 编辑于 2024-04-19 21:36:51

我所做的是将您的ref_df转换为字典，以便我们可以应用映射。你知道吗

ref_dict = dict(zip(ref_df["email"], ref_df["Name"]))
ref_dict

这将为您提供：

{'1@dummy.com': 'John',
 '2@dummy.com': 'Sam',
 '3@dummy.com': 'Tim',
 '4@dummy.com': 'Sara'}

然后，您可以：

df["Name"] = df["email"].map(ref_dict)

您将拥有：

    Name          email id
0   John    1@dummy.com A0
1   Sam     2@dummy.com A1
2   Tim     3@dummy.com A2
3   Sara    4@dummy.com A3

这将重新创建Name列，如果您担心这可能会更改某些现有值，则只能填充NA列

网友

2楼 · 编辑于 2024-04-19 21:36:51

这应该起作用：

df['Name'] = df['Name'].fillna(df['email'].map(ref_df.set_index('email')['Name']))

其工作方式是从ref_df创建一个email到Name的映射，然后用它来填充数据帧中的空白。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章