将序列号添加到表中的groupby().head(n)表达式

2024-04-26 07:19:19 发布

您现在位置:Python中文网/ 问答频道 /正文

我在熊猫中有一个表达,我按国家对前三个值进行排序:

Country              | Value
---------------------|------
Germany              | 102.1
Germany              | 90.3
Germany              | 44.6
Switzerland          | 59.9
Switzerland          | 35.3
Switzerland          | 21.6

...and so on

这是我用df.groupby("Country").head(3)[["Country", "Value"]]得到的。现在,我想添加第三列,将国家内的排名与值关联:

Country              | Value  | Rank
---------------------|--------|------
Germany              | 102.1  | 1
Germany              | 90.3   | 2
Germany              | 44.6   | 3
Switzerland          | 59.9   | 1
Switzerland          | 35.3   | 2
Switzerland          | 21.6   | 3

...and so on

我最好怎么做呢


Tags: anddfso排序valueon国家country
1条回答
网友
1楼 · 发布于 2024-04-26 07:19:19

我相信您需要^{}method='dense',因为通过转换为integers列的排序值,组之间的排名始终增加1:

df['Rank'] = df.groupby("Country")["Value"].rank(method='dense', ascending=False).astype(int)
print (df)
       Country  Value  Rank
0      Germany  102.1     1
1      Germany   90.3     2
2      Germany   44.6     3
3  Switzerland   59.9     1
4  Switzerland   35.3     2
5  Switzerland   21.6     3

如果需要计数器,则最好使用^{}

df['Rank1'] = df.groupby("Country").cumcount() + 1

变化的数据最能体现差异:

print (df)
       Country  Value
0      Germany   90.3 second largest per group - 2
1      Germany  102.1 largest per group - 1
2      Germany   44.6 third largest per group - 3
3  Switzerland   21.6
4  Switzerland   35.3
5  Switzerland   59.9

df['Rank'] = df.groupby("Country")["Value"].rank(method='dense', ascending=False).astype(int)
df['Rank1'] = df.groupby("Country").cumcount() + 1

print (df)
       Country  Value  Rank  Rank1
0      Germany   90.3     2      1
1      Germany  102.1     1      2
2      Germany   44.6     3      3
3  Switzerland   21.6     3      1
4  Switzerland   35.3     2      2
5  Switzerland   59.9     1      3

相关问题 更多 >