当其他列具有特定值时查找groupby的计数

2024-05-28 21:02:55 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个如下所示的数据框

    CHOICE 1            CHOICE 2        CHOICE 3            Gender
0   Chemistry           Earth Science   Applied Chemistry   M
1   Computer Science    Earth Science   Chemistry           M
2   Earth Science       Earth Science   Chemistry           M
3   Earth Science       Earth Science   Chemistry           F
4   Earth Science       Earth Science   Chemistry           F
5   Environmental       Biology         Earth Science       M
6   Geophysics          Earth Science   Computer Science    M
7   Earth Science       Earth Science   Earth Science       M
8   Earth Science       Earth Science   Earth Science       M
9   Earth Science       Earth Science   Earth Science       M
10  Earth Science       Earth Science   Earth Science       M
11  Earth Science       Earth Science   Earth Science       M
12  Earth Science       Earth Science   Earth Science       M
13  Earth Science       Earth Science   Earth Science       M
14  Earth Science       Earth Science   Earth Science       M
15  Biotechnology       Earth Science   Environmental       M
16  Chemistry           Earth Science   Environmental       M
17  Statistics          Earth Science   Environmental       M
18  Earth Science       Earth Science   Geophysics          F
19  Environmental       Biology         Earth Science       F
20  Geophysics          Earth Science   Geophysics          M
21  Chemistry           Earth Science   Mathematics         M
22  Chemistry           Earth Science   Mathematics         M
23  Statistics          Earth Science   Mathematics         F
24  Statistics          Earth Science   Mathematics         F
25  Geophysics          Earth Science   Physics             M
26  Chemistry           Earth Science   Statistics          M
27  Earth Science       Earth Science   Statistics          M
28  Earth Science       Earth Science   Statistics          M
29  Earth Science       Earth Science   Statistics          M
30  Geophysics          Earth Science   Statistics          F
31  Earth Science       Geophysics      Chemistry           M
32  Earth Science       Geophysics      Chemistry           M
33  Earth Science       Geophysics      Computer Science    M
34  Chemistry           Geophysics      Earth Science       M
35  Earth Science       Geophysics      Earth Science       M
36  Geophysics          Geophysics      Earth Science       F
37  Earth Science       Geophysics      Mathematics         F
38  Earth Science       Geophysics      Mathematics         M
39  Earth Science       Geophysics      Physics             M
40  Earth Science       Geophysics      Physics             M
41  Earth Science       Geophysics      Physics             M
42  Earth Science       Geophysics      Physics             M
43  Earth Science       Geophysics      Statistics          M
44  Earth Science       ENVR            BIOT                F
45  Earth Science       Geophysics      CHEM                M
46  Earth Science       Earth Science   Earth Science       M
47  Earth Science       Earth Science   Earth Science       M
48  Earth Science       Earth Science   Earth Science       F
49  Earth Science       Earth Science   Earth Science       F
50  Earth Science       Earth Science   Earth Science       F

我想计算其他三列中每种性别的计数,当它们的值是地球科学或地球物理学时

最终输出应该如下所示

Major          Gender
                         CHOICE 1     CHOICE 2      CHOICE 2
               M           count       count         count
Earth Science  
               F            count      count         count

               M            count      count         count
Geophysics
               F            count      count         count

有人能帮我达到期望的产量吗

提前谢谢


Tags: 数据countgendercomputersciencestatisticschoicechemistry
1条回答
网友
1楼 · 发布于 2024-05-28 21:02:55

首先使用^{},然后按^{}过滤,然后用^{}^{}重塑形状:

df1 = (df.melt('Gender')
         .query('value in ["Earth Science", "Geophysics"]')
         .groupby(['value', 'Gender','variable'])
         .size()
         .unstack())
print (df1)
variable              CHOICE 1  CHOICE 2  CHOICE 3
value         Gender                              
Earth Science F              8         9         5
              M             25        25        13
Geophysics    F              2         2         1
              M              3        12         1

如果需要来自MultiIndex的列:

df1 = (df.melt('Gender')
         .query('value in ["Earth Science", "Geophysics"]')
         .groupby(['value', 'Gender','variable'])
         .size()
         .unstack()
         .reset_index()
         .rename_axis(None, axis=1))
print (df1)
           value Gender  CHOICE 1  CHOICE 2  CHOICE 3
0  Earth Science      F         8         9         5
1  Earth Science      M        25        25        13
2     Geophysics      F         2         2         1
3     Geophysics      M         3        12         1

使用^{}的另一种解决方案:

df1 = (df.melt('Gender')
         .query('value in ["Earth Science", "Geophysics"]')
         .pivot_table(index=['value', 'Gender'],columns='variable', aggfunc='size')
         .reset_index()
         .rename_axis(None, axis=1))
print (df1)
           value Gender  CHOICE 1  CHOICE 2  CHOICE 3
0  Earth Science      F         8         9         5
1  Earth Science      M        25        25        13
2     Geophysics      F         2         2         1
3     Geophysics      M         3        12         1

相关问题 更多 >

    热门问题