pandas统计数据框中每列的值

3 投票
1 回答
2203 浏览
提问于 2025-04-18 01:31

我想找个办法来计算一列中有多少个值,结果比我想的要复杂得多。

       Percentile   Percentile1 Percentile2 Percentile3
0       mediocre    contender   contender   mediocre
69      mediocre    bad         mediocre    mediocre
117     mediocre    mediocre    mediocre    mediocre
144     mediocre    none        mediocre    contender
171     mediocre    mediocre    contender   mediocre

我想做的输出大概是这样的。它会对每一列中的四个选项进行计数。其实就是对每一列使用 pd.value.counts。任何帮助都会非常感谢。

         Percentile     Percentile1     Percentile2     Percentile3
mediocre:    5               2               3               4
contender:   0               1               2               1
bad:         0               1               0               0
none:        0               1               0               0

1 个回答

9

首先,把你的数据整理得“整齐”是很有帮助的。整齐的数据意味着每一列应该代表一个变量,而每一行应该代表一个观察结果。

In [98]: df
Out[98]: 
    Percentile Percentile1 Percentile2 Percentile3
0     mediocre   contender   contender    mediocre
69    mediocre         bad    mediocre    mediocre
117   mediocre    mediocre    mediocre    mediocre
144   mediocre        none    mediocre   contender
171   mediocre    mediocre   contender    mediocre

[5 rows x 4 columns]

在这种情况下,使用melting方法来处理数据表,可以让数据变得整齐:

In [125]: melted = pd.melt(df); melted
Out[125]: 
       variable      value
0    Percentile   mediocre
1    Percentile   mediocre
2    Percentile   mediocre
3    Percentile   mediocre
4    Percentile   mediocre
5   Percentile1  contender
6   Percentile1        bad
7   Percentile1   mediocre
8   Percentile1       none
9   Percentile1   mediocre
10  Percentile2  contender
11  Percentile2   mediocre
12  Percentile2   mediocre
13  Percentile2   mediocre
14  Percentile2  contender
15  Percentile3   mediocre
16  Percentile3   mediocre
17  Percentile3   mediocre
18  Percentile3  contender
19  Percentile3   mediocre

[20 rows x 2 columns]

然后,可以用crosstab来制作一个频率表:

In [127]: pd.crosstab(index=[melted['value']], columns=[melted['variable']])
Out[127]: 
variable   Percentile  Percentile1  Percentile2  Percentile3
value                                                       
bad                 0            1            0            0
contender           0            1            2            1
mediocre            5            2            3            4
none                0            1            0            0

[4 rows x 4 columns]

撰写回答