python将datafram列拆分为两个新列,并删除原始列

2024-06-01 02:10:56 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下数据帧:

import pandas as pd
import numpy as np

df = pd.DataFrame({'Name': ['Steve Smith', 'Joe Nadal',
                            'Roger Federer'],
                  'birthdat/company': ['1995-01-26Sharp, Reed and Crane',
                                      '1955-08-14Price and Sons',
                                      '2000-06-28Pruitt, Bush and Mcguir']})

df[['data_time','full_company_name']] = df['birthdat/company'].str.split('[0-9]{4}-[0-9]{2}-[0-9]{2}', expand=True)
df

通过我的代码,我得到以下信息:

____|____Name______|__birthdat/company_______________|_birthdate_|____company___________
0   |Steve Smith   |1995-01-26Sharp, Reed and Crane  |           |Sharp, Reed and Crane
1   |Joe Nadal     |1955-08-14Price and Sons         |           |Price and Sons
2   |Roger Federer |2000-06-28Pruitt, Bush and Mcguir|           |Pruitt, Bush and Mcguir

我想要的是-获取这个正则表达式('[0-9]{4}-[0-9]{2}-[0-9]{2}'),其余的应该转到“完整的公司名称”列并:

____|____Name______|_birthdate_|____company_name_______
0   |Steve Smith   |1995-01-26 |Sharp, Reed and Crane
1   |Joe Nadal     |1955-08-14 |Price and Sons
2   |Roger Federer |2000-06-28 |Pruitt, Bush and Mcguir

更新问题: 我如何处理缺少的生日或公司名称值, 示例:birthdate/company=“NaApple”或birthdate/company=“2003-01-15Na”缺少的值不仅限于Na


Tags: andnamedfcompanybirthdatestevesmithreed
2条回答

你可以用

df[['data_time','full_company_name']] = df['birthdat/company'].str.extract(r'^([0-9]{4}-[0-9]{2}-[0-9]{2})(.*)', expand=False)
>>> df
            Name  Age  ...   data_time        full_company_name
0    Steve Smith   32  ...  1995-01-26    Sharp, Reed and Crane
1      Joe Nadal   34  ...  1955-08-14           Price and Sons
2  Roger Federer   36  ...  2000-06-28  Pruitt, Bush and Mcguir

[3 rows x 5 columns]

这里使用^{}是因为您需要在不丢失日期的情况下获得两个部分

正则表达式是

  • ^-字符串的开头
  • ([0-9]{4}-[0-9]{2}-[0-9]{2})-您的日期模式被捕获到组1中
  • (.*)-捕获到组2中的字符串的其余部分

regex demo

split通过分隔符拆分字符串,同时忽略它们。我想您需要extract包含两个捕获组:

df[['data_time','full_company_name']] = \
   df['birthdat/company'].str.extract('^([0-9]{4}-[0-9]{2}-[0-9]{2})(.*)')

输出:

    Name           birthdat/company                   data_time    full_company_name
         -                  -       -             -
 0  Steve Smith    1995-01-26Sharp, Reed and Crane    1995-01-26   Sharp, Reed and Crane
 1  Joe Nadal      1955-08-14Price and Sons           1955-08-14   Price and Sons
 2  Roger Federer  2000-06-28Pruitt, Bush and Mcguir  2000-06-28   Pruitt, Bush and Mcguir

相关问题 更多 >