python panda得到两个数据帧之间的匹配和不匹配记录

网友

1楼 · 编辑于 2024-04-26 10:16:42

对您的问题的简单回答是df1.where:

注意：生成的具有NaN的单元格不满足条件，即它们在两个数据帧中不相等。具有实际值的是两个数据帧中相等的值

>>> df1.where(df1.Salary==df2.Salary)
          DoB   ID  Name    Salary
0  12-05-1996  1    AAA  100000.0
1  16-08-1997  2    BBB  200000.0
2  24-04-1998  3    CCC  389999.0
3         NaN  NaN  NaN       NaN

使用pd.merge：如果您只想合并没有列或索引级别的df1&df1，那么它将默认为两个数据帧中列的交集。在

^{2}$

如果要联接列或索引级别，请使用on。在

 >>> pd.merge(df1, df2, on="Salary")
        DoB_x  ID_x Name_x  Salary       DoB_y  ID_y Name_y
0  12-05-1996     1    AAA  100000  12-05-1996     1    AAA
1  16-08-1997     2    BBB  200000  16-08-1997     2    BBB
2  24-04-1998     3    CCC  389999  24-04-1998     3    CCC

对于df2中的不匹配：您可以选择isin(dict)方法：

>>> df2[~df2.isin(df1.to_dict('l')).all(1)]
          DoB  ID Name  Salary
3  05-09-2000   4  DDD  540000

梅布尔给的另一种方式。在

df2[~df2.isin(df1).all(axis=1)]

网友

2楼 · 编辑于 2024-04-26 10:16:42

我的解决方案会有点不同，只需要从另一个数据集中复制工资。在

例如：

DF1["Salary2"] = DF2["Salary"]

MatchDF = DF1[DF1["Salary"] == DF1["Salary2"]]
MisMatchDF = DF1[DF1["Salary"] != DF1["Salary2"]]

网友

3楼 · 编辑于 2024-04-26 10:16:42

# pick index keys and compare column(s)
keys = ['ID', 'Name']
# if comparing all columns:
col_list = [col for col in df1.columns if col not in keys]
# # if comparing specific columns:
# col_list = ['Salary', 'DOB']

# extend keys with col_list for next step
sel_cols = keys.copy()
sel_cols.extend(col_list)

# set a multi-index with keys
# to dataframes with col_list columns
dfa = df1[sel_cols].set_index(keys)
dfb = df2[sel_cols].set_index(keys)

# make an equivalency boolean mask
dfa.update(dfb)
mask = np.equal(df1[col_list].values, dfa.values).all(axis=1)

# slice df1 with mask
Match_df = df1[mask]
Mismatch_df = df1[~mask]

相关问题更多 >

编程相关推荐

热门问题

热门文章

python panda得到两个数据帧之间的匹配和不匹配记录

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >