将两个数据帧转换为numpy数组以进行成对比较

for i in notebook.tqdm(range(svm_data.shape[0])): real_row = np.asarray(real_data.iloc[[i]].to_numpy()) synthetic_row = np.asarray(svm_data.iloc[[i]].to_numpy()) if (np.array_equal(real_row, synthetic_row)): continue else: list_of_rows.append(list(synthetic_row)) gc.collect()

1,0,0.0,0,0,0,0,0,0.0,2 1,0,0.0,0,0,0,0,0,0.0,2 1,0,0.0,0,0,0,0,0,0.0,4 1,0,0.0,0,0,0,0,0,0.0,2 1,0,0.0,0,0,0,0,0,0.0,8 1,0,0.0,0,0,0,0,0,0.0,8 1,0,0.0,0,0,0,0,0,0.0,8 1,0,0.0,0,0,0,0,0,0.0,4 1,0,0.0,0,0,0,0,0,0.0,4 1,0,0.0,0,0,0,0,0,0.0,2

1,0,0.0,0,0,0,0,0,0.0,2 1,0,0.0,0,0,0,0,0,0.0,3 1,0,0.0,0,0,0,0,0,0.0,4 1,0,0.0,0,0,0,0,0,2.0,2 1,0,0.0,0,0,0,0,0,0.0,8 1,0,0.0,0,0,1,0,0,0.0,8 1,0,0.0,0,0,0,0,0,0.0,8 1,0,0.0,0,0,0,0,0,0.0,4 1,0,0.0,0,0,0,0,0,0.0,4 1,0,0.0,5,0,0,0,0,0.0,4

1条回答

网友
1楼 · 发布于 2024-05-23 15:09:48

让我们尝试concat和groupby来识别重复的行：
# sample data df1 = pd.DataFrame([[1,2,3],[1,2,3],[4,5,6],[7,8,9]]) df2 = pd.DataFrame([[4,5,6],[7,8,9]]) s = (pd.concat((df1,df2), keys=(1,2)) .groupby(list(df1.columns)) .ngroup() ) # `s.loc[1]` corresponds to rows in df1 # `s.loc[2]` corresponds to rows in df2 df1_in_df2 = s.loc[1].isin(s.loc[2]) df1[df1_in_df2]
输出：
0 1 2 2 4 5 6 3 7 8 9
更新另一个选项是在非重复df2上进行合并：
df1.merge(df2.drop_duplicates(), on=list(df1.columns), indicator=True, how='left')
输出（您应该能够从中猜出需要哪些行）：
0 1 2 _merge 0 1 2 3 left_only 1 1 2 3 left_only 2 4 5 6 both 3 7 8 9 both

相关问题更多 >

编程相关推荐

热门问题

热门文章