python将datafram列拆分为两个新列，并删除原始列

import pandas as pd import numpy as np df = pd.DataFrame({'Name': ['Steve Smith', 'Joe Nadal', 'Roger Federer'], 'birthdat/company': ['1995-01-26Sharp, Reed and Crane', '1955-08-14Price and Sons', '2000-06-28Pruitt, Bush and Mcguir']}) df[['data_time','full_company_name']] = df['birthdat/company'].str.split('[0-9]{4}-[0-9]{2}-[0-9]{2}', expand=True) df

2条回答

网友

1楼 · 编辑于 2024-06-01 02:10:56

你可以用

df[['data_time','full_company_name']] = df['birthdat/company'].str.extract(r'^([0-9]{4}-[0-9]{2}-[0-9]{2})(.*)', expand=False)
>>> df
            Name  Age  ...   data_time        full_company_name
0    Steve Smith   32  ...  1995-01-26    Sharp, Reed and Crane
1      Joe Nadal   34  ...  1955-08-14           Price and Sons
2  Roger Federer   36  ...  2000-06-28  Pruitt, Bush and Mcguir

[3 rows x 5 columns]

这里使用^{}是因为您需要在不丢失日期的情况下获得两个部分

正则表达式是

^-字符串的开头
([0-9]{4}-[0-9]{2}-[0-9]{2})-您的日期模式被捕获到组1中
(.*)-捕获到组2中的字符串的其余部分

见regex demo

网友

2楼 · 编辑于 2024-06-01 02:10:56

split通过分隔符拆分字符串，同时忽略它们。我想您需要extract包含两个捕获组：

df[['data_time','full_company_name']] = \
   df['birthdat/company'].str.extract('^([0-9]{4}-[0-9]{2}-[0-9]{2})(.*)')

输出：

    Name           birthdat/company                   data_time    full_company_name
         -                  -       -             -
 0  Steve Smith    1995-01-26Sharp, Reed and Crane    1995-01-26   Sharp, Reed and Crane
 1  Joe Nadal      1955-08-14Price and Sons           1955-08-14   Price and Sons
 2  Roger Federer  2000-06-28Pruitt, Bush and Mcguir  2000-06-28   Pruitt, Bush and Mcguir

相关问题更多 >

编程相关推荐

热门问题

热门文章