如何修复转换为datetime时的ValueError?

2024-04-26 00:38:36 发布

您现在位置:Python中文网/ 问答频道 /正文

timedata与我的格式不匹配的错误

以下是数据示例:

import pandas as pd

data = pd.DataFrame({'TransactionTime': ['Sat Feb 02 12:50:00 IST 2019']})

这是我的代码:

data['TransactionTime'] = pd.to_datetime(data['TransactionTime'], format = '%a %b %d %H:%M:%S %Z %Y')

回溯

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
e:\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact)
    431             try:
--> 432                 values, tz = conversion.datetime_to_datetime64(arg)
    433                 return DatetimeIndex._simple_new(values, name=name, tz=tz)

pandas\_libs\tslibs\conversion.pyx in pandas._libs.tslibs.conversion.datetime_to_datetime64()

TypeError: Unrecognized value type: <class 'str'>

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-139-ed51a35d7ed3> in <module>
----> 1 data['TransactionTime'] = pd.to_datetime(data['TransactionTime'], format = '%a %b %d %H:%M:%S %Z %Y')

e:\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, origin, cache)
    726             result = arg.map(cache_array)
    727         else:
--> 728             values = convert_listlike(arg._values, format)
    729             result = arg._constructor(values, index=arg.index, name=arg.name)
    730     elif isinstance(arg, (ABCDataFrame, abc.MutableMapping)):

e:\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact)
    433                 return DatetimeIndex._simple_new(values, name=name, tz=tz)
    434             except (ValueError, TypeError):
--> 435                 raise e
    436 
    437     if result is None:

e:\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact)
    398                 try:
    399                     result, timezones = array_strptime(
--> 400                         arg, format, exact=exact, errors=errors
    401                     )
    402                     if "%Z" in format or "%z" in format:

pandas\_libs\tslibs\strptime.pyx in pandas._libs.tslibs.strptime.array_strptime()

ValueError: time data 'Sat Feb 02 12:50:00 IST 2019' does not match format '%a %b %d %H:%M:%S %Z %Y' (match)

Tags: tonameinformatpandasdatadatetimearg
2条回答

该错误很可能源于%Z无法将IST解析为正确时区的问题。有多个时区可以缩写为“IST”,因此无论如何它都是不明确的

要将例如“IST”解析为特定时区,可以定义映射dict并将其提供给dateutil的parser.parse:

import pandas as pd
import dateutil

tzmap = {'IST': dateutil.tz.gettz('Asia/Kolkata')}

data = pd.DataFrame({'TransactionTime': ['Sat Feb 02 12:50:00 IST 2019']})

data['TransactionTime'] = data['TransactionTime'].apply(lambda t: dateutil.parser.parse(t, tzinfos=tzmap))

# data['TransactionTime']
# 0   2019-02-02 12:50:00+05:30
# Name: TransactionTime, dtype: datetime64[ns, tzfile('Asia/Calcutta')]

如果您的数据有多个时区

  • 将唯一时区映射到相应的UTC偏移量,然后本地化
  • 一旦创建了DateTime列,就可以删除其他列。
    • df.drop(columns=['a', 'b', 'd', 'time', 'tz', 'Y', 'TTime'], inplace=True)
import pandas as pd

# data and dataframe
df = pd.DataFrame({'TTime': ['Sat Feb 02 12:50:00 IST 2019', 'Sat Feb 02 12:50:00 EST 2019']})

                        TTime
 Sat Feb 02 12:50:00 IST 2019
 Sat Feb 02 12:50:00 EST 2019

# split the string into components; assumes all strings are formatted similarly
df[['a', 'b', 'd', 'time', 'tz', 'Y']] = df.TTime.str.split(expand=True)

                        TTime    a    b   d      time   tz     Y
 Sat Feb 02 12:50:00 IST 2019  Sat  Feb  02  12:50:00  IST  2019
 Sat Feb 02 12:50:00 EST 2019  Sat  Feb  02  12:50:00  EST  2019

# create list of unique time zones
uni_tzs = df.tz.unique().tolist()
print(uni_tzs)
>>> ['IST', 'EST']

# UTC offset for each timezone
tzs = ['+05:30', '-05:00']

# combine into a dict
maps = dict(zip(uni_tzs, tzs))

# map the different time zones to their UTC offsets
df.tz = df.tz.map(maps)

# create the DateTime column and convert to a time zone of your choice
df['DateTime'] = pd.to_datetime(pd.to_datetime(df.Y + df.b + df.d + df.time + df.tz, format='%Y%b%d%H:%M:%S%z'), utc=True).dt.tz_convert('Asia/Kolkata')

                        TTime    a    b   d      time      tz     Y                  DateTime
 Sat Feb 02 12:50:00 IST 2019  Sat  Feb  02  12:50:00  +05:30  2019 2019-02-02 12:50:00+05:30
 Sat Feb 02 12:50:00 EST 2019  Sat  Feb  02  12:50:00  -05:00  2019 2019-02-02 23:20:00+05:30

或者

  • 映射到时区名称而不是偏移量。
    • format参数中使用%Z而不是%z
df_tzs = df.tz.unique().tolist()
tzs = ['Asia/Kolkata', 'US/Eastern']
maps = dict(zip(df_tzs, tzs))

df.tz = df.tz.map(maps)

df['DateTime'] = pd.to_datetime(pd.to_datetime(df.Y + df.b + df.d + df.time + df.tz, format='%Y%b%d%H:%M:%S%Z'), utc=True).dt.tz_convert('Asia/Kolkata')

                        TTime    a    b   d      time            tz     Y                  DateTime
 Sat Feb 02 12:50:00 IST 2019  Sat  Feb  02  12:50:00  Asia/Kolkata  2019 2019-02-02 12:50:00+05:30
 Sat Feb 02 12:50:00 EST 2019  Sat  Feb  02  12:50:00    US/Eastern  2019 2019-02-02 23:20:00+05:30

相关问题 更多 >

    热门问题