删除每个组中最后一个子组对应的行

2024-04-28 05:00:24 发布

您现在位置:Python中文网/ 问答频道 /正文

假设我有以下数据帧

import numpy as np
import pandas as pd
df = pd.DataFrame(['eggs', np.nan, 'ham', 'eggs', 'spam', 'spam',
                   'eggs', 'spam', np.nan], columns=['ingredients'])
df['customer'] = (['Badger']*3 + ['Shopkeeper']*3 + ['Pepperpots']*2
    + [np.nan])
df['ordered'] = [1, 1, 0, 0, 1, 0, 1, 0, np.nan]
df.sort_values(['customer', 'ingredients'], inplace=True)

看起来是这样的:

  ingredients    customer  ordered
0        eggs      Badger      1.0
2         ham      Badger      0.0
1         NaN      Badger      1.0
6        eggs  Pepperpots      1.0
7        spam  Pepperpots      0.0
3        eggs  Shopkeeper      0.0
4        spam  Shopkeeper      1.0
5        spam  Shopkeeper      0.0
8         NaN         NaN      NaN

对于每个客户,我想删除与最后一种成分对应的行(按照字母顺序)

例如,索引为4和5的行应该被删除,因为它们对应于店主的最后一种配料

同样,第7行也应该删除,因为它对应于胡椒罐的最后一种成分

NaN值应忽略


Tags: importdfasnpcustomernanspameggs
2条回答

您可以创建一个由groupwise“last”成分组成的系列,然后过滤掉这些成分。注意,出于这个目的,NaN成分不会被去除

s = df.sort_values('ingredients')\
      .groupby('customer')['ingredients']\
      .transform('last').sort_index()

df = df[df['ingredients'] != s]

print(df)

  ingredients    customer  ordered
0        eggs      Badger      1.0
1         NaN      Badger      1.0
3        eggs  Shopkeeper      0.0
6        eggs  Pepperpots      1.0
8         NaN         NaN      NaN

使用此解决方案,您可以省略df.sort_values(['customer', 'ingredients'], inplace=True),因为上面实现的GroupBy+transform按索引对齐

使用^{},默认情况下通过^{}过滤省略NaN的值:

s = df['ingredients'].groupby(df['customer']).transform('last')
df = df[df['ingredients'] != s]
print (df)
  ingredients    customer  ordered
0        eggs      Badger      1.0
1         NaN      Badger      1.0
6        eggs  Pepperpots      1.0
3        eggs  Shopkeeper      0.0
8         NaN         NaN      NaN

相关问题 更多 >