如何从以下数据框中分离日期、月份和年份。这是800万用户的数据。

2024-04-20 13:48:32 发布

您现在位置:Python中文网/ 问答频道 /正文

我试过使用DatetimeIndex方法。你知道吗

包含值的列如下所示

reg_date                    

2013-06-10T00:00:00.000Z

2014-09-30T00:00:00.000Z

2014-09-30T00:00:00.000Z

2014-09-30T00:00:00.000Z

2014-10-01T00:00:00.000Z



type(df.reg_date) yields

pandas.core.series.Series

并使用了以下方法

 df['reg_month'] = pd.DatetimeIndex(df['reg_date']).month

我从早期的数据中得到了这个,但是DatetimeIndex在这里不起作用

并获取以下错误


TypeError                                 Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in _convert_listlike(arg, box, format, name, tz)
    302             try:
--> 303                 values, tz = tslib.datetime_to_datetime64(arg)
    304                 return DatetimeIndex._simple_new(values, name=name, tz=tz)

pandas/_libs/tslib.pyx in pandas._libs.tslib.datetime_to_datetime64()

TypeError: Unrecognized value type: <class 'str'>

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-22-4e7ef5ca2997> in <module>()
----> 1 df['reg_month'] = pd.DatetimeIndex(df['reg_date']).month

C:\ProgramData\Anaconda3\lib\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
    116                 else:
    117                     kwargs[new_arg_name] = new_arg_value
--> 118             return func(*args, **kwargs)
    119         return wrapper
    120     return _deprecate_kwarg

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\datetimes.py in __new__(cls, data, freq, start, end, periods, copy, name, tz, verify_integrity, normalize, closed, ambiguous, dtype, **kwargs)
    340                 is_integer_dtype(data)):
    341             data = tools.to_datetime(data, dayfirst=dayfirst,
--> 342                                      yearfirst=yearfirst)
    343 
    344         if issubclass(data.dtype.type, np.datetime64) or is_datetimetz(data):

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, box, format, exact, unit, infer_datetime_format, origin)
    378         result = _convert_listlike(arg, box, format, name=arg.name)
    379     elif is_list_like(arg):
--> 380         result = _convert_listlike(arg, box, format)
    381     else:
    382         result = _convert_listlike(np.array([arg]), box, format)[0]

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in _convert_listlike(arg, box, format, name, tz)
    304                 return DatetimeIndex._simple_new(values, name=name, tz=tz)
    305             except (ValueError, TypeError):
--> 306                 raise e
    307 
    308     if arg is None:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in _convert_listlike(arg, box, format, name, tz)
    292                     dayfirst=dayfirst,
    293                     yearfirst=yearfirst,
--> 294                     require_iso8601=require_iso8601
    295                 )
    296 

pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()

pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()

pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()

pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string()

C:\ProgramData\Anaconda3\lib\site-packages\dateutil\parser.py in parse(timestr, parserinfo, **kwargs)
   1180         return parser(parserinfo).parse(timestr, **kwargs)
   1181     else:
-> 1182         return DEFAULTPARSER.parse(timestr, **kwargs)
   1183 
   1184 

C:\ProgramData\Anaconda3\lib\site-packages\dateutil\parser.py in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
    557 
    558         if res is None:
--> 559             raise ValueError("Unknown string format")
    560 
    561         if len(res) == 0:

ValueError: Unknown string format

Tags: nameinpyformatpandasdatetimelibpackages
2条回答

可以将数据转换为datetime对象:

import datetime as dt    
df['reg_date'] = pd.to_datetime(df['reg_date'], errors='coerce')

然后您可以提取月份,如下所示:

df['month'] = df['reg_date'].dt.month

输出:

    time    month
0   2013-06-10  6
1   2014-09-30  9
2   2014-09-30  9
3   2014-09-30  9
4   2014-10-01  10

Here是文档。你知道吗

import pandas as pd

n = {"year":[], "month":[], "day":[]}
for i in df['reg_date']:
    n["year"].append(i.split("T")[0].split("-")[0])
    n["month"].append(i.split("T")[0].split("-")[1])
    n["day"].append(i.split("T")[0].split("-")[2])


#Now 'n' is the dictionary contains separated day, month and year from df["reg_date"].. 

Another approach

df["reg_date"] = df["reg_date"].apply(lambda x: x.split("T")[0]) 

 #Here df["reg_date"] converts to column containing date for each records

相关问题 更多 >