替换基于另一个dataframe python pandas的列值-更好的方法？问题的回答

替换基于另一个dataframe python pandas的列值-更好的方法？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

注意：为了简单起见，我使用了一个玩具示例，因为在堆栈溢出中复制/粘贴数据帧是很困难的（如果有简单的方法，请告诉我）。 有没有一种方法可以将一个数据帧中的值合并到另一个数据帧中，而不必获取X，Y列？我希望一列的值替换另一列的所有零值。 <pre><code>df1: Name Nonprofit Business Education X 1 1 0 Y 0 1 0 <- Y and Z have zero values for Nonprofit and Educ Z 0 0 0 Y 0 1 0 df2: Name Nonprofit Education Y 1 1 <- this df has the correct values. Z 1 1 pd.merge(df1, df2, on='Name', how='outer') Name Nonprofit_X Business Education_X Nonprofit_Y Education_Y Y 1 1 1 1 1 Y 1 1 1 1 1 X 1 1 0 nan nan Z 1 1 1 1 1 </code></pre> 在之前的一篇文章中，我尝试了combine_First和dropna（），但这些并不能完成任务。 我想用df2中的值替换df1中的零。此外，我希望所有具有相同名称的行都根据df2进行更改。 <pre><code>Name Nonprofit Business Education Y 1 1 1 Y 1 1 1 X 1 1 0 Z 1 0 1 </code></pre> （需要澄清：name=Z的'Business'列中的值应该为0。） 我现有的解决方案执行以下操作：我基于df2中存在的名称子集，然后用正确的值替换这些值。不过，我想用一种不那么老套的方法来做这件事。 <pre><code>pubunis_df = df2 sdf = df1 regex = str_to_regex(', '.join(pubunis_df.ORGS)) pubunis = searchnamesre(sdf, 'ORGS', regex) sdf.ix[pubunis.index, ['Education', 'Public']] = 1 searchnamesre(sdf, 'ORGS', regex) </code></pre>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

<h2>注意：在最新版本的熊猫中，以上两个答案都不再有效：</h2> KSD的答案将引发错误： <pre><code>df1 = pd.DataFrame([["X",1,1,0], ["Y",0,1,0], ["Z",0,0,0], ["Y",0,0,0]],columns=["Name","Nonprofit","Business", "Education"]) df2 = pd.DataFrame([["Y",1,1], ["Z",1,1]],columns=["Name","Nonprofit", "Education"]) df1.loc[df1.Name.isin(df2.Name), ['Nonprofit', 'Education']] = df2.loc[df2.Name.isin(df1.Name),['Nonprofit', 'Education']].values df1.loc[df1.Name.isin(df2.Name), ['Nonprofit', 'Education']] = df2[['Nonprofit', 'Education']].values Out[851]: ValueError: shape mismatch: value array of shape (2,) could not be broadcast to indexing result of shape (3,) </code></pre> EdChum的回答会给我们一个错误的结果： <pre><code> df1.loc[df1.Name.isin(df2.Name), ['Nonprofit', 'Education']] = df2[['Nonprofit', 'Education']] df1 Out[852]: Name Nonprofit Business Education 0 X 1.0 1 0.0 1 Y 1.0 1 1.0 2 Z NaN 0 NaN 3 Y NaN 1 NaN </code></pre> 好吧，只有当列“Name”中的值是唯一的并且在两个数据帧中都排序时，它才能安全地工作。 我的答案是： <h2>方式1：</h2> <pre><code>df1 = df1.merge(df2,on='Name',how="left") df1['Nonprofit_y'] = df1['Nonprofit_y'].fillna(df1['Nonprofit_x']) df1['Business_y'] = df1['Business_y'].fillna(df1['Business_x']) df1.drop(["Business_x","Nonprofit_x"],inplace=True,axis=1) df1.rename(columns={'Business_y':'Business','Nonprofit_y':'Nonprofit'},inplace=True) </code></pre> <h2>方式2：</h2> <pre><code>df1 = df1.set_index('Name') df2 = df2.set_index('Name') df1.update(df2) df1.reset_index(inplace=True) </code></pre> <a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.update.html" rel="nofollow noreferrer">More guide about update.</a>。需要设置索引的两个数据帧的列名在“update”之前不必相同。你可以试试“Name1”和“Name2”。而且，即使df2中有其他不必要的行，它也可以工作，这不会更新df1。换句话说，df2不需要是df1的超集。 示例： <pre><code>df1 = pd.DataFrame([["X",1,1,0], ["Y",0,1,0], ["Z",0,0,0], ["Y",0,1,0]],columns=["Name1","Nonprofit","Business", "Education"]) df2 = pd.DataFrame([["Y",1,1], ["Z",1,1], ['U',1,3]],columns=["Name2","Nonprofit", "Education"]) df1 = df1.set_index('Name1') df2 = df2.set_index('Name2') df1.update(df2) </code></pre> 结果： <pre><code> Nonprofit Business Education Name1 X 1.0 1 0.0 Y 1.0 1 1.0 Z 1.0 0 1.0 Y 1.0 1 1.0 </code></pre>

替换基于另一个dataframe python pandas的列值-更好的方法？

1 个回答

相关Python问题