基于Pandas中的另一行条件,我如何在groupby对象中排名?例子包括

2024-05-12 21:00:02 发布

您现在位置:Python中文网/ 问答频道 /正文

下面的数据框有4列:跑步者姓名、比赛日期、身高(单位:英寸)、前十名

我想按比赛日期分组,如果参赛者在该比赛日期进入前十名,则将他的身高(以英寸为单位)排在该比赛日期进入前十名的其他参赛者中。我该怎么做

这是原始数据帧:

>>> import pandas as pd
>>> d = {"runner":['mike','paul','jim','dave','douglas'],
...     "race_date":['2019-02-02','2019-02-02','2020-02-02','2020-02-01','2020-02-01'],
...      "height_in_inches":[72,68,70,74,73],
...     "top_ten_finish":["yes","yes","no","yes","no"]}
>>> df = pd.DataFrame(d)
>>> df
    runner   race_date  height_in_inches top_ten_finish
0     mike  2019-02-02                72            yes
1     paul  2019-02-02                68            yes
2      jim  2020-02-02                70             no
3     dave  2020-02-01                74            yes
4  douglas  2020-02-01                73             no
>>> 

这就是我想要的结果。请注意,如果他们没有在比赛前10名中完成比赛,那么新列的值将为0

    runner   race_date  height_in_inches top_ten_finish  if_top_ten_height_rank
0     mike  2019-02-02                72            yes                       1
1     paul  2019-02-02                68            yes                       2
2      jim  2020-02-02                70             no                       0
3     dave  2020-02-01                74            yes                       1
4  douglas  2020-02-01                73             no                       0

谢谢大家!


Tags: noindatetopyesrunnermikeheight
2条回答

我们可以使用rank进行groupby+过滤

df['rank']=df[df.top_ten_finish.eq('yes')].groupby('race_date')['height_in_inches'].rank(ascending=False)
df['rank'].fillna(0,inplace=True)
df
Out[87]: 
    runner   race_date  height_in_inches top_ten_finish  rank
0     mike  2019-02-02                72            yes   1.0
1     paul  2019-02-02                68            yes   2.0
2      jim  2020-02-02                70             no   0.0
3     dave  2020-02-01                74            yes   1.0
4  douglas  2020-02-01                73             no   0.0

您可以在groupby()上进行筛选和排序,然后重新分配:

df['if_top_ten_height_rank'] = (df.loc[df['top_ten_finish']=='yes','height_in_inches']
                                   .groupby(df['race_date']).rank(ascending=False)
                                   .reindex(df.index, fill_value=0)
                                   .astype(int)
                                )

输出:

    runner    race_date      height_in_inches  top_ten_finish      if_top_ten_height_rank
              -                                   
 0  mike      2019-02-02                   72  yes                                      1
 1  paul      2019-02-02                   68  yes                                      2
 2  jim       2020-02-02                   70  no                                       0
 3  dave      2020-02-01                   74  yes                                      1
 4  douglas   2020-02-01                   73  no                                       0

相关问题 更多 >