从另一个datafram有条件地更新

2024-04-20 02:50:39 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个数据帧,需要有条件地更新第一个数据帧中的特定列。你知道吗

df1 = pd.DataFrame([[1,'Foo',1,1,1,np.nan,np.nan,np.nan],[2,'Foo',2,2,2,np.nan,np.nan,np.nan],[3,'Bar',3,3,3,np.nan,np.nan,np.nan]], columns = ['Key','identifier','A','B','C','D','E','F'])

print df1

   Key identifier  A  B  C   D   E   F
0    1        Foo  1  1  1 NaN NaN NaN
1    2        Foo  2  2  2 NaN NaN NaN
2    3        Bar  3  3  3 NaN NaN NaN

df2 = pd.DataFrame([[1,np.nan,10,10,10,5,6,7],[2,np.nan,12,12,12,8,9,10],[3,np.nan,13,13,13,11,12,13]], columns = ['Key','identifier','A','B','C','D','E','F'])

print df2

   Key  identifier   A   B   C   D   E   F
0    1         NaN  10  10  10   5   6   7
1    2         NaN  12  12  12   8   9  10
2    3         NaN  13  13  13  11  12  13

如果df1中的identifer列=='Foo',我需要用df2中相应的列更新df1列D、E、F。如何有条件地更新这三列?你知道吗

df3 = #code here

期望输出:

print df3

   Key identifier  A  B  C    D    E     F
0    1        Foo  1  1  1  5.0  6.0   7.0
1    2        Foo  2  2  2  8.0  9.0  10.0
2    3        Bar  3  3  3  NaN  NaN   NaN

跟进

换言之,df1如下所示:

df1 = pd.DataFrame([[1,'Foo',1,1,1,np.nan,np.nan,np.nan],[4,'Bar',4,4,4,np.nan,np.nan,np.nan],[2,'Foo',2,2,2,np.nan,np.nan,np.nan],[3,'Bar',3,3,3,np.nan,np.nan,np.nan]], columns = ['Key','identifier','A','B','C','D','E','F'])

现在df1和df2的长度不一样,要更新的记录的位置也不匹配。怎么还能用?我得到以下输出:

df2[df1['identifier'] == 'Foo'].combine_first(df1)

Key identifier     A     B     C     D     E     F
0  1.0        Foo  10.0  10.0  10.0   5.0   6.0   7.0
1  4.0        Bar   4.0   4.0   4.0   NaN   NaN   NaN
2  3.0        Foo  13.0  13.0  13.0  11.0  12.0  13.0
3  3.0        Bar   3.0   3.0   3.0   NaN   NaN   NaN

Tags: columns数据keydataframefoonpbarnan
1条回答
网友
1楼 · 发布于 2024-04-20 02:50:39

在用set_indexKey设置到索引之后,使用combine_first。你知道吗

df1

    identifier  A  B  C   D   E   F
Key                                
1          Foo  1  1  1 NaN NaN NaN
2          Foo  2  2  2 NaN NaN NaN
3          Bar  3  3  3 NaN NaN NaN

df2

     identifier   A   B   C   D   E   F
Key                                    
1           NaN  10  10  10   5   6   7
2           NaN  12  12  12   8   9  10
3           NaN  13  13  13  11  12  13

df2[df1.eval('identifier == "Foo"')].combine_first(df1)

    identifier     A     B     C    D    E     F
Key                                             
1          Foo  10.0  10.0  10.0  5.0  6.0   7.0
2          Foo  12.0  12.0  12.0  8.0  9.0  10.0
3          Bar   3.0   3.0   3.0  NaN  NaN   NaN

相关问题 更多 >