将多个列展开为行 - 多重索引
我有一个数据框,结构如下,有多个索引
mx_dict = pd.read_excel('ABC Bookings.xlsx', header=[1,2], sheet_name=None)
如果需要,可以查看这个Excel文件:excel文件
mx = pd.concat(mx_dict.values(), axis=0)
print(mx)
Unnamed: 0_level_0 Unnamed: 1_level_0 January February \
Client Client Full Name Gross Net Billable Gross
0 ABC Client ABC 500.0 400.0 800 NaN
1 ABC Client ABC NaN NaN 0 300.0
Net Billable
0 NaN NaN
1 100.0 600.0
下面是数据框的字典
mx.to_dict('list')
{('Unnamed: 0_level_0', 'Client'): ['ABC', 'ABC'],
('Unnamed: 1_level_0', 'Client Full Name'): ['Client ABC', 'Client ABC'],
('January', 'Gross '): [500.0, nan],
('January', 'Net'): [400.0, nan],
('January', 'Billable'): [800, 0],
('February', 'Gross '): [nan, 300.0],
('February', 'Net'): [nan, 100.0],
('February', 'Billable'): [nan, 600.0]}
第0层 - 前两列没有名字,后面的列是每个月的名字,从一月到十二月
第1层 - 这里面包括客户的代码和名字,还有每个月的总收入/净收入/可计费金额
我想把月份和数值“展开”,并作为额外的行添加,像下面这样
|Client | Client Full Name | Month | Gross | Net | Billable
|ABC | Client ABC | January | 500 | 400 | 800
|ABC | Client ABC | February| 300 | 100 | 600
我尝试了下面的建议,但没有成功,可能是因为我在最初的帖子中没有说明我的数据框是怎么结构的
mx1 = mx.stack(level=0).reset_index().rename(columns={"level_1": "Month"})
我还尝试了其他各种方法,包括展开和透视,但似乎无法得到我想要的结果。有什么建议吗?
这只是一个示例,因为我不能分享完整的细节,但这个Excel文件最终会有数百个不同的客户和数值
(如果你需要我分享更多代码,请告诉我)
谢谢
1 个回答
3
假设这是你的数据框:
nan = np.nan
d = {
("Unnamed: 0_level_0", "Client"): ["ABC", "ABC"],
("Unnamed: 1_level_0", "Client Full Name"): ["Client ABC", "Client ABC"],
("January", "Gross "): [500.0, nan],
("January", "Net"): [400.0, nan],
("January", "Billable"): [800, 0],
("February", "Gross "): [nan, 300.0],
("February", "Net"): [nan, 100.0],
("February", "Billable"): [nan, 600.0],
}
df = pd.DataFrame(d)
Unnamed: 0_level_0 Unnamed: 1_level_0 January February
Client Client Full Name Gross Net Billable Gross Net Billable
0 ABC Client ABC 500.0 400.0 800 NaN NaN NaN
1 ABC Client ABC NaN NaN 0 300.0 100.0 600.0
你可以先用 .set_index
方法,把前两列(与客户名称相关的列)设置为索引。然后再使用 .stack
和 .reset_index
方法:
client_tuple_col = ("Unnamed: 0_level_0", "Client")
client_full_tuple_col = ("Unnamed: 1_level_0", "Client Full Name")
df = df.set_index([client_tuple_col, client_full_tuple_col])
df = (
df.stack(level=0)
.reset_index()
.rename(
columns={
"level_2": "Month",
client_tuple_col: client_tuple_col[1],
client_full_tuple_col: client_full_tuple_col[1],
}
)
.dropna()
)
Client Client Full Name Month Gross Net Billable
0 ABC Client ABC January 500.0 400.0 800.0
1 ABC Client ABC February 300.0 100.0 600.0