日期分析器和read_csv的函数不工作

2021-02-25 04:50:08 发布

您现在位置:Python中文网/ 问答频道 /正文

我有三个不同的数据集,我正在阅读pd.read_csv文件. 其中一列数据是以秒为单位的时间,我想使用我为pd.read_csv文件日期分析器参数。当所有数据都是整数时,它工作得很好。但是,当我有一个字符串或浮点时,我所做的函数就不能工作了。我想问题发生在datetime.datetime.fromtimestamp(float(time_in_secs)我函数的一部分。有人知道我怎样才能让这个在我所有的数据集上运行吗。我完全卡住了。我在下面列出了3个不同数据集的示例。在

数据集1

555, 1404803485, 800

555, 1408906759, 900

数据集2

231, 1404803485, pass

231, 1404803490, fail

数据集3

16010925, 1403890894, 40.5819880696

16010925, 1903929273, 40.5819880696

def dateparse(time_in_secs):

if isinstance(time_in_secs, str):
    if time_in_secs == '\\N':
        time_in_secs = 0

tm = datetime.datetime.fromtimestamp(float(time_in_secs))
tm = tm - datetime.timedelta(
    minutes=tm.minute % 10, seconds=tm.second, microseconds=tm.microsecond)
return tm


pd.read_csv('dataset_here.csv',
           delimiter=',', index_col=[0,1], parse_dates=['Timestamp'], 
                date_parser=dateparse, names=['Serial', 'Timestamp', 'result'])
1条回答
网友
1楼 ·

我认为需要将所有字符串的时间转换为0,因为{}你的解决方案运行良好:

def dateparse(time_in_secs):

    if isinstance(time_in_secs, str):
        #https://stackoverflow.com/a/45372194
        #time_in_secs = 86400
        time_in_secs = 0

    #print (time_in_secs)
    tm = datetime.datetime.fromtimestamp(float(time_in_secs))
    tm = tm - datetime.timedelta(
    minutes=tm.minute % 10, seconds=tm.second, microseconds=tm.microsecond)
    return tm

更一般的解决方案-尝试将值转换为浮点值,如果不可能,请指定默认值:

^{pr2}$

示例:在windows下测试:

import pandas as pd
import datetime

def dateparse(time_in_secs):

    if isinstance(time_in_secs, str):
        try:
            time_in_secs = float(time_in_secs)
        except ValueError:
            #https://stackoverflow.com/a/45372194
            #time_in_secs = 0
            time_in_secs = 86400

    print (time_in_secs)
    tm = datetime.datetime.fromtimestamp(float(time_in_secs))
    tm = tm - datetime.timedelta(
    minutes=tm.minute % 10, seconds=tm.second, microseconds=tm.microsecond)
    return tm

temp=u"""16010925,test,40.5819880696
16010925,1903929273,40.5819880696"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), index_col=[0,1], parse_dates=['Timestamp'], 
                date_parser=dateparse, names=['Serial', 'Timestamp', 'result'])

print (df)
                                 result
Serial   Timestamp                     
16010925 1970-01-02 01:00:00  40.581988
         2030-05-02 07:10:00  40.581988

print (df.index.get_level_values(1))
DatetimeIndex(['1970-01-02 01:00:00', '2030-05-02 07:10:00'], 
              dtype='datetime64[ns]', name='Timestamp', freq=None)

相关问题