使用混合类型将列转换为日期。“to_datetime”不起作用

2024-04-29 05:22:57 发布

您现在位置:Python中文网/ 问答频道 /正文

这是我的数据框:

   INV_NUM        LOCATION        CREATED_DATE     DATE_OPENED
      1             North           10/10/2020      10/11/2020
      2             North           10/11/2020      10/11/2020
      3             South           10/12/2020      10/01/2020
      4             West            10/13/2020      WIP

我想创建一个输出,生成一个名为“STATUS”的新列。如果“创建日期”在“打开日期”之前返回“关闭”,如果在“打开日期”当天或之后返回“打开”,如果“打开日期”为“WIP”,则返回“关闭”

下面是我为此编写的代码:

定义一个函数:

def date_test(row):
    if row['DATE OPENED'] == 'WIP':
        return 'Closed'
    if row['CREATED_DATE'] < row['DATE OPENED']:
        return 'Closed'
    if row['CREATED_DATE'] >= row['DATE OPENED']:
        return 'Open'
    else:
        return 0000

将类型更改为datetime:

df['DATE OPENED'] = pd.to_datetime(df['DATE OPENED'], errors='ignore')
df['CREATED_DATE'] = pd.to_datetime(df['CREATED_DATE'], errors='raise')

测试功能:

df['STATUS'] = df.apply(lambda row: date_test(row), axis=1)

当我运行该代码时,这里是我得到的错误:

TypeError: '<' not supported between instances of 'Timestamp' and 'str'

我还发现,“DATE\u OPENED”列的数据类型在运行到\u datetime之后仍然是“non-null object”

我认为是“WIP”抛出了它,但它应该忽略这一个,继续更改其他的,函数会首先查找“WIP”字符串


Tags: 函数代码testdfdatetimedatereturnif
2条回答

你可以简化逻辑。给定df,将两列转换为_datetime,将NaT保留为无效日期(“WIP”)

df['DATE_OPENED'] = pd.to_datetime(df['DATE_OPENED'], errors='coerce')
df['CREATED_DATE'] = pd.to_datetime(df['CREATED_DATE'], errors='coerce')

现在创建一个列状态,默认为“Closed”:

df['STATUS'] = 'Closed'

由于与NaT的任何比较都会得到False,因此可以将“Open”设置为

df.loc[df['CREATED_DATE'] >= df['DATE_OPENED'], 'STATUS'] = 'Open'

给你

df
   INV_NUM LOCATION CREATED_DATE DATE_OPENED  STATUS
0        1    North   2020-10-10  2020-10-11  Closed
1        2    North   2020-10-11  2020-10-11  Open
2        3    South   2020-10-12  2020-10-01  Open
3        4     West   2020-10-13         NaT  Closed

我认为问题是这样的:

df['DATE OPENED'] = pd.to_datetime(df['DATE OPENED'], errors='ignore')

errors='ignore'似乎以原始形式返回整个序列,因此该列中的任何字符串都不会转换为datetime对象。我建议使用errors='coerce'将“WIP”stings更改为NaT对象,并且您必须更新您的函数来处理该问题,但类似的方法应该可以工作:

import pandas as pd


df = pd.DataFrame(
    dict(
        INV_NUM = [1,2,3,4],
        LOCATION = "North North South West".split(),
        CREATED_DATE = "10/10/2020 10/11/2020 10/12/2020 10/13/2020".split(),
        DATE_OPENED = "10/11/2020 10/11/2020 10/01/2020 WIP".split(),
))

def date_test(row):
    if row['DATE_OPENED'] is pd.NaT:
        return 'Closed'
    if row['CREATED_DATE'] < row['DATE_OPENED']:
        return 'Closed'
    if row['CREATED_DATE'] >= row['DATE_OPENED']:
        return 'Open'
    else:
        return 0000

df['DATE_OPENED'] = pd.to_datetime(df['DATE_OPENED'], errors='coerce')
df['CREATED_DATE'] = pd.to_datetime(df['CREATED_DATE'], errors='raise')

df['STATUS'] = df.apply(lambda row: date_test(row), axis=1)

step through the code in python tutor

编辑: 如果您想保留errors='ignore'和那些“WIP”字符串,那么这应该可以:

df['DATE_OPENED'] = df['DATE_OPENED'].apply(
    lambda x: pd.to_datetime(x, errors='ignore')
)

python tutor link 2

相关问题 更多 >