<h2>注意:在最新版本的熊猫中,以上两个答案都不再有效:</h2>
<p>KSD的答案将引发错误:</p>
<pre><code>df1 = pd.DataFrame([["X",1,1,0],
["Y",0,1,0],
["Z",0,0,0],
["Y",0,0,0]],columns=["Name","Nonprofit","Business", "Education"])
df2 = pd.DataFrame([["Y",1,1],
["Z",1,1]],columns=["Name","Nonprofit", "Education"])
df1.loc[df1.Name.isin(df2.Name), ['Nonprofit', 'Education']] = df2.loc[df2.Name.isin(df1.Name),['Nonprofit', 'Education']].values
df1.loc[df1.Name.isin(df2.Name), ['Nonprofit', 'Education']] = df2[['Nonprofit', 'Education']].values
Out[851]:
ValueError: shape mismatch: value array of shape (2,) could not be broadcast to indexing result of shape (3,)
</code></pre>
<p>EdChum的回答会给我们一个错误的结果:</p>
<pre><code> df1.loc[df1.Name.isin(df2.Name), ['Nonprofit', 'Education']] = df2[['Nonprofit', 'Education']]
df1
Out[852]:
Name Nonprofit Business Education
0 X 1.0 1 0.0
1 Y 1.0 1 1.0
2 Z NaN 0 NaN
3 Y NaN 1 NaN
</code></pre>
<p>好吧,只有当列“Name”中的值是唯一的并且在两个数据帧中都排序时,它才能安全地工作。</p>
<p><strong>我的答案是:</strong></p>
<h2><strong>方式1:</strong></h2>
<pre><code>df1 = df1.merge(df2,on='Name',how="left")
df1['Nonprofit_y'] = df1['Nonprofit_y'].fillna(df1['Nonprofit_x'])
df1['Business_y'] = df1['Business_y'].fillna(df1['Business_x'])
df1.drop(["Business_x","Nonprofit_x"],inplace=True,axis=1)
df1.rename(columns={'Business_y':'Business','Nonprofit_y':'Nonprofit'},inplace=True)
</code></pre>
<h2><strong>方式2:</strong></h2>
<pre><code>df1 = df1.set_index('Name')
df2 = df2.set_index('Name')
df1.update(df2)
df1.reset_index(inplace=True)
</code></pre>
<p><a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.update.html" rel="nofollow noreferrer">More guide about update.</a>。需要设置索引的两个数据帧的列名在“update”之前不必相同。你可以试试“Name1”和“Name2”。而且,即使df2中有其他不必要的行,它也可以工作,这不会更新df1。换句话说,df2不需要是df1的超集。</p>
<p>示例:</p>
<pre><code>df1 = pd.DataFrame([["X",1,1,0],
["Y",0,1,0],
["Z",0,0,0],
["Y",0,1,0]],columns=["Name1","Nonprofit","Business", "Education"])
df2 = pd.DataFrame([["Y",1,1],
["Z",1,1],
['U',1,3]],columns=["Name2","Nonprofit", "Education"])
df1 = df1.set_index('Name1')
df2 = df2.set_index('Name2')
df1.update(df2)
</code></pre>
<p>结果:</p>
<pre><code> Nonprofit Business Education
Name1
X 1.0 1 0.0
Y 1.0 1 1.0
Z 1.0 0 1.0
Y 1.0 1 1.0
</code></pre>