在这种情况下如何使用groupby()?

2024-04-29 16:40:47 发布

您现在位置:Python中文网/ 问答频道 /正文

假设:有一个数据帧:

country       edition  sports       Athletes               Medals
Germany          1990    Aquatics  HAJOS, Alfred           silver
Germany          1990    Aquatics  HIRSCHMANN, Otto        silver
Germany          1990    Aquatics  DRIVAS, Dimitrios       silver
US               2008    Athletics MALOKINIS, Ioannis      silver
US               2008    Athletics HAJOS, Alfred           silver
US               2009    Athletics CHASAPIS, Spiridon      gold
France           2010    Athletics CHOROPHAS, Efstathios   gold
France           2010    golf      HAJOS, Alfred           silver
France           2011    golf      ANDREOU, Joannis        silver

我想知道哪个版本发行的银牌最多? 所以我试着用groupby函数来解决这个问题:

df.groupby('Edition')[df['Medal']=='Silver'].count().idxmax() 

但它给了我

Key error = 'Columns not found: False, True'

谁能告诉我是什么问题吗?你知道吗


Tags: 数据dfsilvercountryusgroupbyalfredfrance
1条回答
网友
1楼 · 发布于 2024-04-29 16:40:47

这是你的熊猫数据框:

import pandas as pd

data = [
    ['Germany', 1990, 'Aquatics', 'HAJOS, Alfred', 'silver'], 
    ['Germany', 1990, 'Aquatics', 'IRSCHMANN, Otto', 'silver'], 
    ['Germany', 1990, 'Aquatics', 'DRIVAS, Dimitrios', 'silver'], 
    ['US', 2008, 'Athletics', 'MALOKINIS, Ioannis', 'silver'], 
    ['US', 2008, 'Athletics', 'HAJOS, Alfred', 'silver'], 
    ['US', 2009, 'Athletics', 'CHASAPIS, Spiridon', 'gold'], 
    ['France', 2010, 'Athletics', 'CHOROPHAS, Efstathios', 'gold'], 
    ['France', 2010, 'golf', 'HAJOS, Alfred', 'silver'], 
    ['France', 2011, 'golf', 'ANDREOU, Joannis', 'silver']
]

df = pd.DataFrame(data, columns = ['country', 'edition', 'sports', 'Athletes', 'Medals'])
print(df) 

   country  edition     sports               Athletes  Medals
0  Germany     1990   Aquatics          HAJOS, Alfred  silver
1  Germany     1990   Aquatics        IRSCHMANN, Otto  silver
2  Germany     1990   Aquatics      DRIVAS, Dimitrios  silver
3       US     2008  Athletics     MALOKINIS, Ioannis  silver
4       US     2008  Athletics          HAJOS, Alfred  silver
5       US     2009  Athletics     CHASAPIS, Spiridon    gold
6   France     2010  Athletics  CHOROPHAS, Efstathios    gold
7   France     2010       golf          HAJOS, Alfred  silver
8   France     2011       golf       ANDREOU, Joannis  silver

现在,您只需过滤银牌,然后groupby版(注意'Edition'将抛出KeyError而不是'edition'),最后得到计数:

df[df.Medals == 'silver'].groupby('edition').count()['Medals'].idxmax()
>>> 1990
网友
2楼 · 发布于 2024-04-29 16:40:47

df[df['Medal']=='silver'].groupby('edition').size().idxmax()

我试过了,成功了!我刚刚用size()替换了count()

网友
3楼 · 发布于 2024-04-29 16:40:47

您可以按两列分组以求解:

df[df['Medals'] == 'silver'].groupby(['edition','Medals'],as_index=True)['Athletes'].count().idxmax()

# Outcome:
(1990, 'silver')

相关问题 更多 >