使用多个条件进行分类和重新分组

2024-05-16 19:19:53 发布

您现在位置:Python中文网/ 问答频道 /正文

这是样本数据-

Product     Type        Name    Time        Value
Product a   Medicare    CVS     2018-10-05  10
Product a   Medicare    Cigna   2018-10-05  20
Product a   Medicare    United  2018-10-05  30
Product a   Medicare    Humana  2018-10-05  40
Product a   Medicare    Centene 2018-10-05  50
Product a   Comm        CVS     2018-10-05  20
Product a   Comm        Cigna   2018-10-05  30
Product a   Comm        United  2018-10-05  40
Product a   Comm        Humana  2018-10-05  50
Product a   Comm        Centene 2018-10-05  60
Product a   Medicare    CVS     2019-10-03  30
Product a   Medicare    Cigna   2019-10-03  20
Product a   Medicare    United  2019-10-03  10
Product a   Medicare    Humana  2019-10-03  5
Product a   Medicare    Centene 2019-10-03  12
Product a   Comm        CVS     2019-10-03  87
Product a   Comm        Cigna   2019-10-03  43
Product a   Comm        United  2019-10-03  50
Product a   Comm        Humana  2019-10-03  30
Product a   Comm        Centene 2019-10-03  90

首先,我需要找到《时代》杂志最近一周的内容

上表为2019-10-03

现在,对于这一周,我需要按每个“类型”的值对前2个“名称”进行排序/查找

然后,我需要创建一个如下所示的数据帧-

2019-10-03周“医疗保险”的前两个“名称”是CVS和信诺。 2019-10-03周“通信”的前两个“名称”是Centene和CVS

Product    Type         Name    Time       Value
Product a   Medicare    CVS     2018-10-05  10
Product a   Medicare    Cigna   2018-10-05  20
Product a   Comm        Centene 2018-10-05  60
Product a   Comm        CVS     2018-10-05  20
Product a   Medicare    CVS     2019-10-03  30
Product a   Medicare    Cigna   2019-10-03  20
Product a   Comm        Centene 2019-10-03  90
Product a   Comm        CVS     2019-10-03  87



Tags: 数据name名称timevaluetypeproductunited
2条回答

首先对最新日期时间使用ProductTypeName组合进行筛选,然后对所有日期时间使用merge组合进行筛选:

df['Time'] = pd.to_datetime(df['Time'])

df1= (df[df['Time'].eq(df['Time'].max())]
      .sort_values('Value', ascending=False)\
      .groupby(['Product', 'Type'])\
      .head(2))
print (df1)
      Product      Type     Name       Time  Value
19  Product a      Comm  Centene 2019-10-03     90
15  Product a      Comm      CVS 2019-10-03     87
10  Product a  Medicare      CVS 2019-10-03     30
11  Product a  Medicare    Cigna 2019-10-03     20

df = (df.merge(df1[['Product','Type', 'Name']])
        .sort_values(['Product','Time','Type','Value'], 
                     ascending=[True, True,True, False]))
print (df)
     Product      Type     Name       Time  Value
6  Product a      Comm  Centene 2018-10-05     60
4  Product a      Comm      CVS 2018-10-05     20
2  Product a  Medicare    Cigna 2018-10-05     20
0  Product a  Medicare      CVS 2018-10-05     10
7  Product a      Comm  Centene 2019-10-03     90
5  Product a      Comm      CVS 2019-10-03     87
1  Product a  Medicare      CVS 2019-10-03     30
3  Product a  Medicare    Cigna 2019-10-03     20

IIUC,首先对数据帧进行排序,然后分组并使用head:

df.sort_values('Value', ascending=False)\
  .groupby(['Product', 'Type', 'Time'])\
  .head(2)\
  .sort_index()

输出:

      Product      Type     Name        Time  Value
3   Product a  Medicare   Humana  2018-10-05     40
4   Product a  Medicare  Centene  2018-10-05     50
8   Product a      Comm   Humana  2018-10-05     50
9   Product a      Comm  Centene  2018-10-05     60
10  Product a  Medicare      CVS  2019-10-03     30
11  Product a  Medicare    Cigna  2019-10-03     20
15  Product a      Comm      CVS  2019-10-03     87
19  Product a      Comm  Centene  2019-10-03     90

相关问题 更多 >