panda在所有dataframe列中使用聚合统计信息分组

A B C D E F 0 aa 5 3 2 2 2 1 aa 3 2 2 3 3 2 ac 2 0 2 7 7 3 ac 9 2 3 8 8 4 ac 2 3 7 3 3 5 ad 0 0 0 1 1 6 ad 9 9 9 9 9 7 ad 6 6 6 6 6 8 ad 3 3 3 3 3

A count_B mean_B count_C mean_C count_D mean_D etc... 0 aa 2 4.000000 2 2.500000 2 2.0 etc... 1 ac 3 4.333333 3 2.500000 3 4.0 2 ad 4 4.500000 4 2.500000 4 4.5

import pandas as pd import numpy as np import pprint as pp test_dataframe = pd.DataFrame({ 'A' : ['aa', 'aa', 'ac', 'ac', 'ac', 'ad', 'ad', 'ad', 'ad'], 'B' : [5, 3, 2, 9, 2, 0, 9, 6, 3], 'C' : [3, 2, 0, 2, 3, 0, 9, 6, 3], 'D' : [2, 2, 2, 3, 7, 0, 9, 6, 3], 'E' : [2, 3, 7, 8, 3, 1, 9, 6, 3], 'F' : [2, 3, 7, 8, 3, 1, 9, 6, 3] }) #group, aggregate, convert object to df, sort index grouped = test_dataframe.groupby(['A']) grouped_stats = grouped['C'].agg([np.mean, len]) grouped_stats = pd.DataFrame(grouped_stats).reset_index() grouped_stats.rename(columns = {'mean':'mean_C', 'len':'count_C'}, inplace=True) grouped_stats.sort_index(axis=1, inplace=True) print "Input: " pp.pprint(test_dataframe) print "Output: " pp.pprint(grouped_stats)

1条回答

网友

1楼 · 发布于 2024-04-20 13:20:34

您不必逐个调用grouped['B']grouped['C']，只需传递整个groupby对象，pandas将对所有列应用聚合函数。在

import pandas as pd

test_dataframe = pd.DataFrame({
    'A' : ['aa', 'aa', 'ac', 'ac', 'ac', 'ad', 'ad', 'ad', 'ad'],
    'B' : [5, 3, 2, 9, 2, 0, 9, 6, 3],
    'C' : [3, 2, 0, 2, 3, 0, 9, 6, 3],
    'D' : [2, 2, 2, 3, 7, 0, 9, 6, 3],
    'E' : [2, 3, 7, 8, 3, 1, 9, 6, 3],
    'F' : [2, 3, 7, 8, 3, 1, 9, 6, 3]
})
agg_funcs = ['count', 'mean']
test_dataframe = test_dataframe.groupby(['A']).agg(agg_funcs)

columns = 'B C D E F'.split()
names = [y + '_' + x for x in columns for y in agg_funcs]
test_dataframe.columns = names

Out[89]: 
    count_B  mean_B  count_C  mean_C  count_D  mean_D  count_E  mean_E  count_F  mean_F
A                                                                                      
aa        2  4.0000        2  2.5000        2     2.0        2    2.50        2    2.50
ac        3  4.3333        3  1.6667        3     4.0        3    6.00        3    6.00
ad        4  4.5000        4  4.5000        4     4.5        4    4.75        4    4.75

相关问题更多 >

编程相关推荐

热门问题

热门文章