获取列中包含分层数据的值计数

2024-05-23 15:03:34 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个如下所示的数据帧:

    Category    Shuffled        Name     Sequence    Length
0        pgm           0    protein1         IAAI         4
1        pgm           0    protein2         PGGP         4
2        pgm           0    protein3         KIIK         4
3        pgm           0    protein4         PGGP         4
4        btn           0    protein1         ABBA         4
5        btn           0    protein2         IAAI         4
6        btn           0    protein3         ABBA         4
7        btn           0    protein4         PGGP         4
8        pgm           1    protein1         IAAI         4
9        pgm           1    protein2         PGGP         4
10       pgm           1    protein3         KIIK         4
11       pgm           1    protein4         PGGP         4
12       btn           1    protein1         ABBA         4
13       btn           1    protein2         IAAI         4
14       btn           1    protein3         ABBA         4
15       btn           1    protein4         PGGP         4

我想计算每个Category/Shuffled组中Sequence的出现次数,并将其添加为新列。结果数据应如下所示:

    Category    Shuffled        Name     Sequence    Length    Sequence_count
0        pgm           0    protein1         IAAI         4                 1
1        pgm           0    protein2         PGGP         4                 2
2        pgm           0    protein3         KIIK         4                 1
3        pgm           0    protein4         PGGP         4                 2
4        btn           0    protein1         ABBA         4                 2
5        btn           0    protein2         IAAI         4                 1
6        btn           0    protein3         ABBA         4                 2
7        btn           0    protein4         PGGP         4                 1
8        pgm           1    protein1         IAAI         4                 1
9        pgm           1    protein2         PGGP         4                 2
10       pgm           1    protein3         KIIK         4                 1
11       pgm           1    protein4         PGGP         4                 2
12       btn           1    protein1         ABBA         4                 2
13       btn           1    protein2         IAAI         4                 1
14       btn           1    protein3         ABBA         4                 2
15       btn           1    protein4         PGGP         4                 1

到目前为止,我所尝试的有效方法是

counts = df.groupby(['Category', 'Shuffled'])['Sequence'].value_counts()

这让我

Category    Shuffled    Sequence
pgm         0           PGGP        2
                        IAAI        1
                        KIIK        1
            1           PGGP        2
                        IAAI        1
                        KIIK        1
btn         0           ABBA        2
                        IAAI        1
                        PGGP        1
            1           ABBA        2
                        IAAI        1
                        PGGP        1

这些是我想要的值,但是我如何在原始数据帧中获得它们自己的行呢


Tags: 数据namesequencecategorypgmbtnshuffledabba