合并时不复制列

2024-04-19 09:50:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要合并两个数据帧而不创建重复的列。第一个数据帧(dfa)缺少值。第二个数据帧(dfb)具有唯一的值。这与Excel中的vlookup相同。你知道吗

dfa如下所示:

postcode  lat  lon ...plus 32 more columns
M20       2.3  0.2
LS1       NaN  NaN
LS1       NaN  NaN
LS2       NaN  NaN
M21       2.4  0.3

dfb只包含唯一的邮政编码和值,其中lat和lon在dfa中为NaN。看起来是这样的:

postcode  lat  lon 
LS1       1.4  0.1
LS2       1.5  0.2

我想要的输出是:

postcode  lat  lon ...plus 32 more columns
M20       2.3  0.2
LS1       1.4  0.1
LS1       1.4  0.1
LS2       1.5  0.2
M21       2.4  0.3

我试过用pd.合并像这样:

outputdf = pd.merge(dfa, dfb, on='Postcode', how='left')

这将导致创建重复的列:

postcode  lat_x  lon_x  lat_y  lat_x ...plus 32 more columns
M20       2.3    0.2    NaN    NaN
LS1       NaN    NaN    1.4    0.1
LS1       NaN    NaN    1.4    0.1
LS2       NaN    NaN    1.5    0.2
M21       2.4    0.3    NaN    NaN

this answer开始,我尝试使用:

output = dfa
for df in [dfa, dfb]:
    ouput.update(df.set_index('Postcode')) 

但收到“ValueError:cannot reindex from a duplicate axis”。你知道吗

同样从上面的答案来看,这个不起作用:

output.merge(pd.concat([dfa, dfb]), how='left')

没有重复的列,但“Lat”和“Lon”中的值仍然为空。你知道吗

有没有一种方法可以在“Postcode”上合并而不创建重复的列;有效地使用pandas执行VLOOKUP?你知道吗


Tags: columns数据moreplusnanpdlonpostcode
2条回答

DataFrame.combine_first(self, other)似乎是最好的解决方案。你知道吗

如果只需要一行代码而不想更改输入数据帧:

 df1.set_index('postcode').combine_first(df2.set_index('postcode'))

如果需要保留df1的索引:

df1.reset_index().set_index('postcode').combine_first(df2.set_index('postcode')).reset_index().set_index('index').sort_index()

不是为了优雅,而是为了工作。你知道吗

在两个数据帧中使用^{}postcode索引,然后在必要时为相同顺序的列添加^{},如原始df1

print (df1)
  postcode  lat  lon  plus  32  more  columns
0      M20  2.3  0.2   NaN NaN   NaN      NaN
1      LS1  NaN  NaN   NaN NaN   NaN      NaN
2      LS1  NaN  NaN   NaN NaN   NaN      NaN
3      LS2  NaN  NaN   NaN NaN   NaN      NaN
4      M21  2.4  0.3   NaN NaN   NaN      NaN

df1 = df1.set_index('postcode')
df2 = df2.set_index('postcode')

df3 = df1.combine_first(df2).reindex(df1.columns, axis=1)
print (df3)
          lat  lon  plus  32  more  columns
postcode                                   
LS1       1.4  0.1   NaN NaN   NaN      NaN
LS1       1.4  0.1   NaN NaN   NaN      NaN
LS2       1.5  0.2   NaN NaN   NaN      NaN
M20       2.3  0.2   NaN NaN   NaN      NaN
M21       2.4  0.3   NaN NaN   NaN      NaN

相关问题 更多 >