如果两列中的记录在数据中至少两次没有同时出现，则删除pandas中的行

df1 = pd.DataFrame(np.array([['28/02/2017', 'Apple'], ['28/02/2017', 'Apple'], ['31/03/2017', 'Apple'],['28/02/2017', 'IBM'],['28/02/2017', 'WalMart'], ['28/02/2017', 'WalMart'],['03/07/2017', 'WalMart']]), columns=['date','keyword'])

df2 = pd.DataFrame(np.array([['28/02/2017', 'Apple'], ['28/02/2017', 'Apple'], ['28/02/2017', 'WalMart'], ['28/02/2017', 'WalMart']]), columns=['date', 'keyword'])

2条回答

网友

1楼 · 编辑于 2024-04-25 01:13:43

df1.groupby(['date','keyword']).apply(lambda x: x if len(x) >= 2 else None).dropna()

输出

         date  keyword
0  28/02/2017    Apple
1  28/02/2017    Apple
4  28/02/2017  WalMart
5  28/02/2017  WalMart

网友

2楼 · 编辑于 2024-04-25 01:13:43

使用^{}指定检查重复的列，使用keep=False按^{}返回所有重复行：

df2 = df1[df1.duplicated(subset=['date','keyword'], keep=False)]
print (df2)
         date  keyword
0  28/02/2017    Apple
1  28/02/2017    Apple
4  28/02/2017  WalMart
5  28/02/2017  WalMart

如果需要指定行数，请使用^{}和^{}计数：

df2 = df1[df1.groupby(['date','keyword'])['date'].transform('size') >= 2]

如果小数据帧或性能不重要，请使用filter：

df2 = df1.groupby(['date','keyword']).filter(lambda x: len(x) >= 2)
print (df2)
         date  keyword
0  28/02/2017    Apple
1  28/02/2017    Apple
4  28/02/2017  WalMart
5  28/02/2017  WalMart

相关问题更多 >

编程相关推荐

热门问题

热门文章