具有多个日期的列

2024-04-25 12:10:51 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据框,它的列如下所示:

Event date
1/3/2013
11/01/2011-10/01/2012
11/01/2011-10/01/2012
11/01/2011-10/01/2012
10/01/2012 - 02/18/2013
2/12/2013
01/18/2013-01/23/2013
11/01/2012-01/19/2013

有没有一个好办法把日期分成两列,像这样

df['Start date']
df['end date']

其中,带有单个日期的行默认为开始日期。你知道吗


Tags: 数据eventdfdatestartend办法
2条回答

您还可以在这里使用Series.str.extract()一下子完成这一切:

In [22]: df
Out[22]:
                event_date
0                 1/3/2013
1    11/01/2011-10/01/2012
2    11/01/2011-10/01/2012
3    11/01/2011-10/01/2012
4  10/01/2012 - 02/18/2013
5                2/12/2013
6    01/18/2013-01/23/2013
7    11/01/2012-01/19/2013

In [23]: df.event_date.str.extract(r'(?P<all>(?P<start>\d{1,2}/\d{1,2}/\d{4})\s*-?\s*(?P<end>\d{1,2}/\d{1,2}/\d{4})?)')
Out[23]:
                       all       start         end
0                 1/3/2013    1/3/2013         NaN
1    11/01/2011-10/01/2012  11/01/2011  10/01/2012
2    11/01/2011-10/01/2012  11/01/2011  10/01/2012
3    11/01/2011-10/01/2012  11/01/2011  10/01/2012
4  10/01/2012 - 02/18/2013  10/01/2012  02/18/2013
5                2/12/2013   2/12/2013         NaN
6    01/18/2013-01/23/2013  01/18/2013  01/23/2013
7    11/01/2012-01/19/2013  11/01/2012  01/19/2013

可以使用矢量化字符串split执行以下操作:

>>> df

                event_date  x
0                 1/3/2013  1
1    11/01/2011-10/01/2012  1
2    11/01/2011-10/01/2012  1
3    11/01/2011-10/01/2012  1
4  10/01/2012 - 02/18/2013  1
5                2/12/2013  1
6    01/18/2013-01/23/2013  1
7    11/01/2012-01/19/2013  1


>>> df['beg'] = df['event_date'].str.split('\s*-\s*').str[0]
>>> df['end'] = df['event_date'].str.split('\s*-\s*').str[1]
>>> df

                event_date  x         beg         end
0                 1/3/2013  1    1/3/2013         NaN
1    11/01/2011-10/01/2012  1  11/01/2011  10/01/2012
2    11/01/2011-10/01/2012  1  11/01/2011  10/01/2012
3    11/01/2011-10/01/2012  1  11/01/2011  10/01/2012
4  10/01/2012 - 02/18/2013  1  10/01/2012  02/18/2013
5                2/12/2013  1   2/12/2013         NaN
6    01/18/2013-01/23/2013  1  01/18/2013  01/23/2013
7    11/01/2012-01/19/2013  1  11/01/2012  01/19/2013

编辑正如@DSM指出的,您还可以执行以下操作:

>>> pd.DataFrame(df['event_date'].str.split('\s*-\s*').tolist(),
                  columns=['beg','end'])

          beg         end
0    1/3/2013        None
1  11/01/2011  10/01/2012
2  11/01/2011  10/01/2012
3  11/01/2011  10/01/2012
4  10/01/2012  02/18/2013
5   2/12/2013        None
6  01/18/2013  01/23/2013
7  11/01/2012  01/19/2013

相关问题 更多 >