替换基于另一个dataframe python pandas的列值-更好的方法？

df1: Name Nonprofit Business Education X 1 1 0 Y 0 1 0 <- Y and Z have zero values for Nonprofit and Educ Z 0 0 0 Y 0 1 0 df2: Name Nonprofit Education Y 1 1 <- this df has the correct values. Z 1 1 pd.merge(df1, df2, on='Name', how='outer') Name Nonprofit_X Business Education_X Nonprofit_Y Education_Y Y 1 1 1 1 1 Y 1 1 1 1 1 X 1 1 0 nan nan Z 1 1 1 1 1

pubunis_df = df2 sdf = df1 regex = str_to_regex(', '.join(pubunis_df.ORGS)) pubunis = searchnamesre(sdf, 'ORGS', regex) sdf.ix[pubunis.index, ['Education', 'Public']] = 1 searchnamesre(sdf, 'ORGS', regex)

3条回答

网友

1楼 · 编辑于 2024-05-16 18:33:33

在[27]中：这是正确的。

df.loc[df.Name.isin(df1.Name), ['Nonprofit', 'Education']] = df1[['Nonprofit', 'Education']].values

df
Out[27]:

Name  Nonprofit  Business  Education

0    X          1         1          0
1    Y          1         1          1
2    Z          1         0          1
3    Y          1         1          1

[4行x 4列]

只有当df1中的所有行都存在于df中时，上述操作才有效。换句话说，df应该是df1的超集

如果在df1中有一些与df不匹配的行，则应遵循以下步骤

换句话说，df不是df1的超集：

df.loc[df.Name.isin(df1.Name), ['Nonprofit', 'Education']] = 
df1.loc[df1.Name.isin(df.Name),['Nonprofit', 'Education']].values

网友

2楼 · 编辑于 2024-05-16 18:33:33

注意：在最新版本的熊猫中，以上两个答案都不再有效：

KSD的答案将引发错误：

df1 = pd.DataFrame([["X",1,1,0],
              ["Y",0,1,0],
              ["Z",0,0,0],
              ["Y",0,0,0]],columns=["Name","Nonprofit","Business", "Education"])    

df2 = pd.DataFrame([["Y",1,1],
              ["Z",1,1]],columns=["Name","Nonprofit", "Education"])   

df1.loc[df1.Name.isin(df2.Name), ['Nonprofit', 'Education']] = df2.loc[df2.Name.isin(df1.Name),['Nonprofit', 'Education']].values

df1.loc[df1.Name.isin(df2.Name), ['Nonprofit', 'Education']] = df2[['Nonprofit', 'Education']].values

Out[851]:
ValueError: shape mismatch: value array of shape (2,) could not be broadcast to indexing result of shape (3,)

EdChum的回答会给我们一个错误的结果：

 df1.loc[df1.Name.isin(df2.Name), ['Nonprofit', 'Education']] = df2[['Nonprofit', 'Education']]

df1
Out[852]: 
  Name  Nonprofit  Business  Education
0    X        1.0         1        0.0
1    Y        1.0         1        1.0
2    Z        NaN         0        NaN
3    Y        NaN         1        NaN

好吧，只有当列“Name”中的值是唯一的并且在两个数据帧中都排序时，它才能安全地工作。

我的答案是：

方式1：

df1 = df1.merge(df2,on='Name',how="left")
df1['Nonprofit_y'] = df1['Nonprofit_y'].fillna(df1['Nonprofit_x'])
df1['Business_y'] = df1['Business_y'].fillna(df1['Business_x'])
df1.drop(["Business_x","Nonprofit_x"],inplace=True,axis=1)
df1.rename(columns={'Business_y':'Business','Nonprofit_y':'Nonprofit'},inplace=True)

方式2：

df1 = df1.set_index('Name')
df2 = df2.set_index('Name')
df1.update(df2)
df1.reset_index(inplace=True)

More guide about update.。需要设置索引的两个数据帧的列名在“update”之前不必相同。你可以试试“Name1”和“Name2”。而且，即使df2中有其他不必要的行，它也可以工作，这不会更新df1。换句话说，df2不需要是df1的超集。

示例：

df1 = pd.DataFrame([["X",1,1,0],
              ["Y",0,1,0],
              ["Z",0,0,0],
              ["Y",0,1,0]],columns=["Name1","Nonprofit","Business", "Education"])    

df2 = pd.DataFrame([["Y",1,1],
              ["Z",1,1],
              ['U',1,3]],columns=["Name2","Nonprofit", "Education"])   

df1 = df1.set_index('Name1')
df2 = df2.set_index('Name2')


df1.update(df2)

结果：

      Nonprofit  Business  Education
Name1                                
X           1.0         1        0.0
Y           1.0         1        1.0
Z           1.0         0        1.0
Y           1.0         1        1.0

网友

3楼 · 编辑于 2024-05-16 18:33:33

使用^{}中的布尔掩码筛选df并从rhs df中分配所需的行值：

In [27]:

df.loc[df.Name.isin(df1.Name), ['Nonprofit', 'Education']] = df1[['Nonprofit', 'Education']]
df
Out[27]:
  Name  Nonprofit  Business  Education
0    X          1         1          0
1    Y          1         1          1
2    Z          1         0          1
3    Y          1         1          1

[4 rows x 4 columns]

注意：在最新版本的熊猫中，以上两个答案都不再有效：

方式1：

方式2：

相关问题更多 >

编程相关推荐

热门问题

热门文章