将多个列展开为行 - 多重索引

0 投票
1 回答
66 浏览
提问于 2025-04-14 16:33

我有一个数据框,结构如下,有多个索引

mx_dict = pd.read_excel('ABC Bookings.xlsx', header=[1,2], sheet_name=None)

如果需要,可以查看这个Excel文件:excel文件

mx = pd.concat(mx_dict.values(), axis=0)
print(mx)

Unnamed: 0_level_0 Unnamed: 1_level_0 January                 February  \
              Client   Client Full Name  Gross     Net Billable   Gross    
0                ABC         Client ABC   500.0  400.0      800      NaN   
1                ABC         Client ABC     NaN    NaN        0    300.0   

                   
     Net Billable  
0    NaN      NaN  
1  100.0    600.0  

下面是数据框的字典

mx.to_dict('list')

{('Unnamed: 0_level_0', 'Client'): ['ABC', 'ABC'],
 ('Unnamed: 1_level_0', 'Client Full Name'): ['Client ABC', 'Client ABC'],
 ('January', 'Gross '): [500.0, nan],
 ('January', 'Net'): [400.0, nan],
 ('January', 'Billable'): [800, 0],
 ('February', 'Gross '): [nan, 300.0],
 ('February', 'Net'): [nan, 100.0],
 ('February', 'Billable'): [nan, 600.0]}

第0层 - 前两列没有名字,后面的列是每个月的名字,从一月到十二月

第1层 - 这里面包括客户的代码和名字,还有每个月的总收入/净收入/可计费金额

我想把月份和数值“展开”,并作为额外的行添加,像下面这样

|Client | Client Full Name | Month   | Gross | Net | Billable
|ABC    | Client ABC       | January | 500   | 400 | 800
|ABC    | Client ABC       | February| 300   | 100 | 600

我尝试了下面的建议,但没有成功,可能是因为我在最初的帖子中没有说明我的数据框是怎么结构的

mx1 = mx.stack(level=0).reset_index().rename(columns={"level_1": "Month"})

我还尝试了其他各种方法,包括展开和透视,但似乎无法得到我想要的结果。有什么建议吗?

这只是一个示例,因为我不能分享完整的细节,但这个Excel文件最终会有数百个不同的客户和数值

(如果你需要我分享更多代码,请告诉我)

谢谢

1 个回答

3

假设这是你的数据框:

nan = np.nan

d = {
    ("Unnamed: 0_level_0", "Client"): ["ABC", "ABC"],
    ("Unnamed: 1_level_0", "Client Full Name"): ["Client ABC", "Client ABC"],
    ("January", "Gross "): [500.0, nan],
    ("January", "Net"): [400.0, nan],
    ("January", "Billable"): [800, 0],
    ("February", "Gross "): [nan, 300.0],
    ("February", "Net"): [nan, 100.0],
    ("February", "Billable"): [nan, 600.0],
}

df = pd.DataFrame(d)
  Unnamed: 0_level_0 Unnamed: 1_level_0 January                 February                
              Client   Client Full Name  Gross     Net Billable   Gross     Net Billable
0                ABC         Client ABC   500.0  400.0      800      NaN    NaN      NaN
1                ABC         Client ABC     NaN    NaN        0    300.0  100.0    600.0

你可以先用 .set_index 方法,把前两列(与客户名称相关的列)设置为索引。然后再使用 .stack.reset_index 方法:

client_tuple_col = ("Unnamed: 0_level_0", "Client")
client_full_tuple_col = ("Unnamed: 1_level_0", "Client Full Name")

df = df.set_index([client_tuple_col, client_full_tuple_col])
df = (
    df.stack(level=0)
    .reset_index()
    .rename(
        columns={
            "level_2": "Month",
            client_tuple_col: client_tuple_col[1],
            client_full_tuple_col: client_full_tuple_col[1],
        }
    )
    .dropna()
)
  Client Client Full Name     Month  Gross     Net  Billable
0    ABC       Client ABC   January   500.0  400.0     800.0
1    ABC       Client ABC  February   300.0  100.0     600.0

撰写回答