按我们分类统计

2024-05-15 20:58:04 发布

您现在位置:Python中文网/ 问答频道 /正文

如何使用熊猫为每个类别的每个用户创建频率计数。我想这样做,以便我可以枢轴创建一个效用矩阵

|--|**author** | **category**|   
0|  A | movies  
1|  B | games  
2|  C | pics  
4|  A | movies  
5|  C | movies  
6|  B | games 




|--|**author** | **category count**|   

A | movies |2 |  
B | games  |2 |  
C | movies |1 |  
C | pics   |1 | 

Tags: 用户count矩阵效用movies类别gamesauthor
1条回答
网友
1楼 · 发布于 2024-05-15 20:58:04

可以使用^{}^{}来获取列authorcategory中所有类别的长度—输出是SeriesMultiIndex。你知道吗

print (df.groupby(['author','category']).size())
author  category
A       movies      2
B       games       2
C       movies      1
        pics        1
dtype: int64

然后添加^{}用于从MultiIndex创建列,并设置值列的列名-输出为DataFrame

df = df.groupby(['author','category']).size().reset_index(name='category count')
print (df)
  author category  category count
0      A   movies               2
1      B    games               2
2      C   movies               1
3      C     pics               1

但如果需要^{}有多种解决方案:

#add unstack for reshape
df1 = df.groupby(['author','category']).size().unstack(fill_value=0)
print (df1)
category  games  movies  pics
author                       
A             0       2     0
B             2       0     0
C             0       1     1

df1 = pd.crosstab(df['author'],df['category'])
print (df1)
category  games  movies  pics
author                       
A             0       2     0
B             2       0     0
C             0       1     1

df1 = df.pivot_table(index='author',columns='category', aggfunc='size', fill_value=0)
print (df1)
category  games  movies  pics
author                       
A             0       2     0
B             2       0     0
C             0       1     1

编辑:

What is the difference between size and count in pandas?

相关问题 更多 >