Pandas多重索引:如何对齐列多重索引下的列

2024-04-18 02:11:52 发布

您现在位置:Python中文网/ 问答频道 /正文

我的初始数据帧如下所示:

import pandas as pd


df = pd.DataFrame(data=[['Core','PM2',1234,'Direct','2019-11-08 00:00:00','2019-11-08 00:59:59',3.300,'V'],['Long Term','Wind',1111,'Direct','2019-11-09 00:00:00','2019-11-09 00:59:59',0.00123,'V']], 
                  columns=['Program','Parameter','Station','Method','Start','End','Measurement','Flag'])
df
      Program   Parameter   Station Method                Start                 End Measurement Flag
0        Core         PM2      1234 Direct  2019-11-08 00:00:00 2019-11-08 00:59:59     3.30000    V
1   Long Term        Wind      1111 Direct  2019-11-09 00:00:00 2019-11-09 00:59:59     0.00123    V

然后,我为数据帧编制索引:

df_index = df.set_index(['Start','End','Measurement','Flag'])
df_index

这给了我:

                                                              Program   Parameter   Station Method
              Start                 End Measurement Flag                
2019-11-08 00:00:00 2019-11-08 00:59:59     3.30000    V         Core         PM2      1234 Direct
2019-11-09 00:00:00 2019-11-09 00:59:59     0.00123    V    Long Term        Wind      1111 Direct

然后,我为列创建一个多索引:

df_columns = pd.MultiIndex.from_frame(df_index[['Program','Parameter','Station','Method']])

然后,我使用多索引创建一个新的数据帧:

data = pd.DataFrame(df_index, index=df_index.index, columns=df_columns)
data

这给了我:

                                                      Program     Core  Long Term
                                                    Parameter      PM2       Wind
                                                      Station     1234       1111
                                                       Method   Direct     Direct
              Start                 End Measurement      Flag       
2019-11-08 00:00:00 2019-11-08 00:59:59     3.30000         V      NaN        NaN
2019-11-09 00:00:00 2019-11-09 00:59:59     0.00123         V      NaN        NaN

我想要的是让多索引列Program、Parameter、Station和Method将每个度量和标记分组在其下面,将开始和结束作为索引:

                                         Program       Core        Long Term
                                       Parameter        PM2             Wind
                                         Station       1234             1111
                                          Method     Direct           Direct
              Start                 End         Measurement Flag Measurement Flag
2019-11-08 00:00:00 2019-11-08 00:59:59             3.30000    V     
2019-11-09 00:00:00 2019-11-09 00:59:59                              0.00123    V   

任何帮助都将不胜感激


Tags: coredfindexparameterprogramstartmethodlong
1条回答
网友
1楼 · 发布于 2024-04-18 02:11:52

您可以尝试一系列堆叠/取消堆叠操作:

import pandas
df = pd.DataFrame(data=[['Core','PM2',1234,'Direct','2019-11-08 00:00:00','2019-11-08 00:59:59',3.300,'V'],['Long Term','Wind',1111,'Direct','2019-11-09 00:00:00','2019-11-09 00:59:59',0.00123,'V']], columns=['Program','Parameter','Station','Method','Start','End','Measurement','Flag'])
df_index = df.set_index(['Start','End', 'Program','Parameter','Station','Method'])
df_index.unstack([-4, -3, -2, -1]).stack(-5).unstack(-1)

Screenshot

相关问题 更多 >