如何获取大Pandas的分类百分比

data1 = {'Group':['Winner','Winner','Winner','Loser','Loser','Loser'], 'MathStudy': ['Read','Read','Notes','Cheat','Cheat','Read'], 'ScienceStudy': ['Notes','Read','Cheat','Cheat','Read','Notes']} df1 = pd.DataFrame(data=data1)

3条回答

网友

1楼 · 编辑于 2024-06-16 14:16:20

@jezrael的解决方案是直观的，我会直接做什么。然而，我最近了解到melt通常表现不佳。如果性能很重要，例如在重复使用的代码中，这里有一个替代方案：

g = df1.groupby('Group')
cols = ['MathStudy', 'ScienceStudy']
out = (pd.concat({col:g[col].value_counts(normalize=True) for col in cols})
   .unstack(level=-1, fill_value=0)
)

对于运行时：

2.9 ms ± 96.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

与melt方法相比：

9.44 ms ± 261 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

输出：

                        Cheat     Notes      Read
MathStudy    Loser   0.666667  0.000000  0.333333
             Winner  0.000000  0.333333  0.666667
ScienceStudy Loser   0.333333  0.333333  0.333333
             Winner  0.333333  0.333333  0.333333

注意：pd.crosstab本质上是groupby()加上一些额外的簿记。两列上的groupby通常要慢得多

网友

2楼 · 编辑于 2024-06-16 14:16:20

将^{}与^{}和normalize参数一起使用：

df1 = df1.melt('Group', var_name='Type')

df2 = pd.crosstab([df1['Group'], df1['Type']], df1['value'], normalize=0)
print (df2)
value                   Cheat     Notes      Read
Group  Type                                      
Loser  MathStudy     0.666667  0.000000  0.333333
       ScienceStudy  0.333333  0.333333  0.333333
Winner MathStudy     0.000000  0.333333  0.666667
       ScienceStudy  0.333333  0.333333  0.333333

最后如果需要MultiIndex到具有删除value列名的列添加^{}到具有^{}：

df2 = df2.rename_axis(columns=None).reset_index()
print (df2)
    Group          Type     Cheat     Notes      Read
0   Loser     MathStudy  0.666667  0.000000  0.333333
1   Loser  ScienceStudy  0.333333  0.333333  0.333333
2  Winner     MathStudy  0.000000  0.333333  0.666667
3  Winner  ScienceStudy  0.333333  0.333333  0.333333

网友

3楼 · 编辑于 2024-06-16 14:16:20

以下是另一种选择：

g = df.set_index('Group').stack().str.get_dummies().groupby(level=[0,1]).sum()
g.div(g.sum(axis=1),axis=0).round(2)

相关问题更多 >

编程相关推荐

热门问题

热门文章