Python:dataframe的索引是三列的组合,如何将它们分开?

2024-04-19 21:00:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个如下所示的数据帧。你知道吗

索引由三个变量组成:“日期”、“时间”和“状态”。你知道吗

我有两个问题:

1)我可以将此索引转换为三个常规列吗?你知道吗

2)我可以将“日期”和“时间”合并为一个新的时间序列索引吗?你知道吗

Dataframe example

 in[19]: mydf
Out[19]: 
                                   Entry        Exit
Date     Time     Status                            
01-01-13 03:22:00 REGULAR     57308825.0  43088232.0
         07:22:00 REGULAR     57309553.0  43088773.0
         11:22:00 REGULAR     57310969.0  43089369.0
         15:22:00 REGULAR     57313168.0  43090433.0
         19:22:00 REGULAR     57315755.0  43092329.0
         23:22:00 REGULAR     57317073.0  43093590.0
01-02-13 03:22:00 REGULAR     57317475.0  43094076.0
         07:22:00 REGULAR     57319975.0  43094558.0
         07:37:45 DOOR OPEN    1524176.0    672377.0
         07:37:46 DOOR CLOSE   1524176.0    672377.0
         11:22:00 REGULAR     57326441.0  43097377.0
         15:22:00 REGULAR     57331818.0  43100057.0
         19:22:00 REGULAR     57337786.0  43103918.0
         23:22:00 REGULAR     57339290.0  43105700.0
01-03-13 03:22:00 REGULAR     57339676.0  43106258.0
         07:22:00 REGULAR     57342339.0  43106791.0
         09:37:39 REGULAR     23317856.0  15286811.0
         11:22:00 REGULAR     57349205.0  43109943.0
         15:22:00 REGULAR     57354752.0  43112434.0
         15:22:01 DOOR OPEN       4558.0         0.0
         15:27:12 LOGON           4558.0         0.0
         15:28:04 LGF-MAN         4558.0         0.0
         15:28:07 DOOR CLOSE      4558.0         0.0
         15:28:08 DOOR OPEN       4558.0         0.0
         15:28:21 DOOR CLOSE      4558.0         0.0
         18:25:09 DOOR OPEN    7171535.0   2420826.0
         18:27:14 DOOR CLOSE   7171535.0   2420826.0
         18:27:42 DOOR OPEN    7171535.0   2420826.0
         18:27:44 DOOR CLOSE   7171535.0   2420826.0
         18:27:45 DOOR OPEN    7171535.0   2420826.0

Tags: 数据inclosedate状态时间exit序列
2条回答

数据源:

In [46]: x
Out[46]:
                                  a         b
(01-01-13, 03:22:00, aaa)  0.773264  0.074545
(01-02-13, 03:22:00, bbb)  0.070720  0.806460
(01-03-13, 13:33:00, ccc)  0.645056  0.374402

Can I convert this index into three general columns?

In [50]: x[['i1','i2','i3']] = pd.DataFrame(x.index.values.tolist()).values

In [51]: x
Out[51]:
                                  a         b        i1        i2   i3
(01-01-13, 03:22:00, aaa)  0.773264  0.074545  01-01-13  03:22:00  aaa
(01-02-13, 03:22:00, bbb)  0.070720  0.806460  01-02-13  03:22:00  bbb
(01-03-13, 13:33:00, ccc)  0.645056  0.374402  01-03-13  13:33:00  ccc

Can I combine "Date" and "Time" together as a new time series index?

In [53]: x = x.set_index(pd.to_datetime(x.i1 + ' ' + x.i2, errors='coerce'))

In [54]: x
Out[54]:
                            a         b        i1        i2   i3
2013-01-01 03:22:00  0.773264  0.074545  01-01-13  03:22:00  aaa
2013-01-02 03:22:00  0.070720  0.806460  01-02-13  03:22:00  bbb
2013-01-03 13:33:00  0.645056  0.374402  01-03-13  13:33:00  ccc

也可以直接从索引值执行此操作:

In [56]: x
Out[56]:
                                  a         b
(01-01-13, 03:22:00, aaa)  0.735934  0.488669
(01-02-13, 03:22:00, bbb)  0.214214  0.234158
(01-03-13, 13:33:00, ccc)  0.071282  0.747958

In [57]: x = x.set_index(pd.to_datetime(x.index.str[0] + ' ' + x.index.str[1], errors='coerce'))

In [58]: x
Out[58]:
                            a         b
2013-01-01 03:22:00  0.735934  0.488669
2013-01-02 03:22:00  0.214214  0.234158
2013-01-03 13:33:00  0.071282  0.747958

我们可以从:

>>> my_df = pd.DataFrame({'Entry': [5.73], 'Exit': [4.31]}, 
                         index=[('01-01-13', '03:22:00', 'REGULAR')])
>>> my_df
                               Entry  Exit
(01-01-13, 03:22:00, REGULAR)   5.73  4.31

1)要转换为3列:

>>> combined = pd.concat([my_df.reset_index()['index'].apply(pd.Series), 
                          my_df.reset_index(drop=True)], 
                          axis=1)
          0         1        2  Entry  Exit
0  01-01-13  03:22:00  REGULAR   5.73  4.31

现在您将有3个新列,称为0、1和2。你知道吗

2)要组合日期和时间以获得DatetimeIndex:

>>> combined.set_index(pd.to_datetime(combined[0] + ' ' + combined[1]))
                            0         1        2  Entry  Exit
2013-01-01 03:22:00  01-01-13  03:22:00  REGULAR   5.73  4.31

如果需要,现在可以删除列0和1,并重命名列2。你知道吗

相关问题 更多 >