带有多重索引列的Pandas DataFrame - 合并层次
我有一个数据框,叫做 grouped
,它的列是多层索引,长得像下面这样:
import pandas as pd
import numpy as np
import random
codes = ["one","two","three"];
colours = ["black", "white"];
textures = ["soft", "hard"];
N= 100 # length of the dataframe
df = pd.DataFrame({ 'id' : range(1,N+1),
'weeks_elapsed' : [random.choice(range(1,25)) for i in range(1,N+1)],
'code' : [random.choice(codes) for i in range(1,N+1)],
'colour': [random.choice(colours) for i in range(1,N+1)],
'texture': [random.choice(textures) for i in range(1,N+1)],
'size': [random.randint(1,100) for i in range(1,N+1)],
'scaled_size': [random.randint(100,1000) for i in range(1,N+1)]
}, columns= ['id', 'weeks_elapsed', 'code','colour', 'texture', 'size', 'scaled_size'])
grouped = df.groupby(['code', 'colour']).agg( {'size': [np.sum, np.average, np.size, pd.Series.idxmax],'scaled_size': [np.sum, np.average, np.size, pd.Series.idxmax]}).reset_index()
>> grouped
code colour size scaled_size
sum average size idxmax sum average size idxmax
0 one black 1031 60.647059 17 81 185.153944 10.891408 17 47
1 one white 481 37.000000 13 53 204.139249 15.703019 13 53
2 three black 822 48.352941 17 6 123.269405 7.251141 17 31
3 three white 1614 57.642857 28 50 285.638337 10.201369 28 37
4 two black 523 58.111111 9 85 80.908912 8.989879 9 88
5 two white 669 41.812500 16 78 82.098870 5.131179 16 78
[6 rows x 10 columns]
我想把这些列的索引合并成一个,比如变成 "Level1|Level2",比如 size|sum
,scaled_size|sum
等等。如果这样做不行,那有没有办法像我上面那样用 groupby()
,而不生成多层索引的列呢?
6 个回答
0
如果你想把操作连接起来,可以这样做:
out = (grouped.set_axis(grouped.columns.values, axis=1)
# If you want to preserve order and strip the leading |
.rename(columns=lambda col: '|'.join(col).strip('|'))
# or if you don't care the extra |
#.rename(columns='|'.join)
# If you want to change the order and strip the leading |
#.rename(columns=lambda col: f'{col[1]}|{col[0]}'.strip('|'))
# or the order matters and you don't care the extra |
#.rename(columns='{0[1]}|{0[0]}'.format)
)
print(out)
code colour size|sum size|average size|size size|idxmax scaled_size|sum scaled_size|average scaled_size|size scaled_size|idxmax
0 one black 620 41.333333 15 24 7727 515.133333 15 48
1 one white 678 45.200000 15 37 8290 552.666667 15 17
2 three black 957 43.500000 22 34 11899 540.863636 22 0
3 three white 918 54.000000 17 12 8017 471.588235 17 63
4 two black 1009 63.062500 16 73 8954 559.625000 16 35
5 two white 601 40.066667 15 90 8729 581.933333 15 96
6
这段话的意思是,感谢suraj给出的简洁回答,大家可以点击这个链接查看详细内容:https://stackoverflow.com/a/72616083/317797
df.columns = df.columns.map('_'.join)
17
根据Scott Boston的回答, 这里有个小更新(这段代码适用于两个或更多层的列):
temp.columns.map(lambda x: '|'.join([str(i) for i in x]))
谢谢你,Boston!
42
你可以随时更改列的设置:
grouped.columns = ['%s%s' % (a, '|%s' % b if b else '') for a, b in grouped.columns]
180
有很多种方法可以把 MultiIndex 列变成单层列,这些方法更符合 Python 的风格。
- 可以使用 map 和 join 来处理字符串类型的列名:
grouped.columns = grouped.columns.map('|'.join).str.strip('|')
print(grouped)
输出结果:
code colour size|sum size|average size|size size|idxmax \
0 one black 862 53.875000 16 14
1 one white 554 46.166667 12 18
2 three black 842 49.529412 17 90
3 three white 740 56.923077 13 97
4 two black 1541 61.640000 25 50
scaled_size|sum scaled_size|average scaled_size|size scaled_size|idxmax
0 6980 436.250000 16 77
1 6101 508.416667 12 13
2 7889 464.058824 17 64
3 6329 486.846154 13 73
4 12809 512.360000 25 23
- 对于包含数字类型数据的列名,可以使用 map 和 format。
grouped.columns = grouped.columns.map('{0[0]}|{0[1]}'.format)
输出结果:
code| colour| size|sum size|average size|size size|idxmax \
0 one black 734 52.428571 14 30
1 one white 1110 65.294118 17 88
2 three black 930 51.666667 18 3
3 three white 1140 51.818182 22 20
4 two black 656 38.588235 17 77
5 two white 704 58.666667 12 17
scaled_size|sum scaled_size|average scaled_size|size scaled_size|idxmax
0 8229 587.785714 14 57
1 8781 516.529412 17 73
2 10743 596.833333 18 21
3 10240 465.454545 22 26
4 9982 587.176471 17 16
5 6537 544.750000 12 49
- 如果你用的是 Python 3.6 及以上版本,可以使用列表推导和 f-string:
grouped.columns = [f'{i}|{j}' if j != '' else f'{i}' for i,j in grouped.columns]
输出结果:
code colour size|sum size|average size|size size|idxmax \
0 one black 1003 43.608696 23 76
1 one white 1255 59.761905 21 66
2 three black 777 45.705882 17 39
3 three white 630 52.500000 12 23
4 two black 823 54.866667 15 33
5 two white 491 40.916667 12 64
scaled_size|sum scaled_size|average scaled_size|size scaled_size|idxmax
0 12532 544.869565 23 27
1 13223 629.666667 21 13
2 8615 506.764706 17 92
3 6101 508.416667 12 43
4 7661 510.733333 15 42
5 6143 511.916667 12 49