上的多列的Pandas透视表

nj wd wpt 1.0 2.0 3.0 1.0 2.0 3.0 1.0 2.0 3.0 ptype 1 1 1 1 0 2 1 2 1 0 2 0 1 1 1 0 1 0 1 1

nj = df.pivot_table(index='ptype', columns='nj', aggfunc='count').ix[:, 'wd'] wpt = df.pivot_table(index='ptype', columns='wpt', aggfunc='count').ix[:, 'wd'] wd = df.pivot_table(index='ptype', columns='wd', aggfunc='count').ix[:, 'nj'] out = pd.concat([nj, wd, wpt], axis=1, keys=['nj', 'wd', 'wpt']).fillna(0) out.columns.names = [None, None] print(out) nj wd wpt 1 2 3 1 2 3 1 2 3 ptype 1 1.0 1.0 1.0 0.0 2.0 1.0 2.0 1.0 0.0 2 0.0 1.0 1.0 1.0 0.0 1.0 0.0 1.0 1.0

3条回答

网友

1楼 · 编辑于 2024-05-29 03:16:06

您可以先进行聚合，然后使用unstack方法pivot，而不是一步完成聚合：

(df.set_index('ptype')
 .groupby(level='ptype')
# to do the count of columns nj, wd, wpt against the column ptype using 
# groupby + value_counts
 .apply(lambda g: g.apply(pd.value_counts))
 .unstack(level=1)
 .fillna(0))

#      nj             wd            wpt
#       1    2    3    1    2    3    1    2    3
#ptype                                  
#1    1.0  1.0  1.0  0.0  2.0  1.0  2.0  1.0  0.0
#2    0.0  1.0  1.0  1.0  0.0  1.0  0.0  1.0  1.0

避免使用apply方法的另一个选项：

(df.set_index('ptype').stack()
 .groupby(level=[0,1])
 .value_counts()
 .unstack(level=[1,2])
 .fillna(0)
 .sort_index(axis=1))

样本数据上的原始计时：

原始解决方案：

%%timeit
nj = df.pivot_table(index='ptype', columns='nj', aggfunc='count').ix[:, 'wd']
wpt = df.pivot_table(index='ptype', columns='wpt', aggfunc='count').ix[:, 'wd']
wd = df.pivot_table(index='ptype', columns='wd', aggfunc='count').ix[:, 'nj']
out = pd.concat([nj, wd, wpt], axis=1, keys=['nj', 'wd', 'wpt']).fillna(0)
out.columns.names = [None, None]
# 100 loops, best of 3: 12 ms per loop

选择一：

%%timeit
(df.set_index('ptype')
 .groupby(level='ptype')
 .apply(lambda g: g.apply(pd.value_counts))
 .unstack(level=1)
 .fillna(0))
# 100 loops, best of 3: 10.1 ms per loop

选项二：

%%timeit 
(df.set_index('ptype').stack()
 .groupby(level=[0,1])
 .value_counts()
 .unstack(level=[1,2])
 .fillna(0)
 .sort_index(axis=1))
# 100 loops, best of 3: 4.3 ms per loop

网友

2楼 · 编辑于 2024-05-29 03:16:06

一个简单的解决方案是

employee.pivot_table(index= ‘Title’, values= “Salary”, aggfunc= [np.mean, np.median, min, max, np.std], fill_value=0)

在本例中，对于salary列，我们使用不同的聚合函数

网友

3楼 · 编辑于 2024-05-29 03:16:06

另一个使用groupby和unstack的解决方案。

df2 = pd.concat([df.groupby(['ptype',e])[e].count().unstack() for e in ['nj','wd','wpt']],axis=1).fillna(0).astype(int)    
df2.columns=pd.MultiIndex.from_product([['nj','wd','wpt'],[1.0,2.0,3.0]])

df2
Out[207]: 
       nj          wd         wpt        
      1.0 2.0 3.0 1.0 2.0 3.0 1.0 2.0 3.0
ptype                                    
1       1   1   1   0   2   1   2   1   0
2       0   1   1   1   0   1   0   1   1

相关问题更多 >

编程相关推荐

热门问题

热门文章