根据表中另一列的最大值删除重复行 - 问答 - Python中文网

根据表中另一列的最大值删除重复行

2024-04-25 16:53:19 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我有一个像下面这样的大熊猫

在下面的df中，在索引0,1&2,3……& 500,501,502中发现了X&Y列中的重复值，第二轮开始时，索引1000, 1001 & 1002,1003 & ....1200,1201....it goes on中X&Y列中的重复值相同，但权重列中的权重不同。你知道吗

index     x         y         weight
0         59.644    10.72     0.69
1         59.644    10.72     0.82
2         57.822    10.13     0.75
3         57.822    10.13     0.68
4         57.822    10.13     0.20
.
.
500       53.252    10.85     0.15
501       53.252    10.85     0.95
502       53.252    10.85     0.69
.
.
1000      59.644    10.72     0.85
1001      59.644    10.72     0.73
1002      57.822    10.13     0.92
1003      57.822    10.13     0.15
.
.
.
1200       53.252    10.85     0.78
1201       53.252    10.85     1.098

我的要求

I would like to have my df
1) Avoid repeated/duplicate row values in X & Y which has weight value less than 0.60
2) But still duplicates in X & Y column repeats, So now i want to compare the weight values between duplicate rows & remove the rows which has lesser weight.
3) If I use the below code, it removes all the duplicates between x & y

df_2.groupby(['X', 'Y'], as_index=False,sort=False)['weight'].max()

But I want to compare the first occured duplicates and remove them, then the 2nd, then 3rd and so on ..so that the continuity of duplicate value prevails after some rows. for better understanding, please refer the below required df

df应该是什么样子的：

index     x         y         weight
1         59.644    10.72     0.82
2         57.822    10.13     0.75
.
.
501      53.252    10.85      0.95
.
.
1000      59.644    10.72     0.85
.
1002      57.822    10.13     0.92
.
.
1201       53.252    10.85     1.098   
.
.

我尝试过使用if语句，但是代码行增加了。我认为应该有一个替代的Python方式，使它更容易。（内置函数或使用numpy）任何帮助都将不胜感激。你知道吗

Tags： the to in df which index on it

1条回答

网友
1楼 · 发布于 2024-04-25 16:53:19

与注释中提到的@Erfan一样，这里有必要通过helper Series对连续组进行分组：
x1 = df['x'].ne(df['x'].shift()).cumsum() y1 = df['y'].ne(df['y'].shift()).cumsum() df = df[df.groupby([x1, y1])['weight'].transform('max') == df['weight']] print (df) index x y weight 1 1 59.644 10.72 0.820 2 2 57.822 10.13 0.750 6 501 53.252 10.85 0.950 8 1000 59.644 10.72 0.850 10 1002 57.822 10.13 0.920 13 1201 53.252 10.85 1.098

相关问题更多 >

编程相关推荐

热门问题

热门文章