使用时跳过日期错误的行pd.读卡器

网友

1楼 · 发布于 2024-04-18 23:27:59

这里有另一种方法使用pd.convert_对象（）方法：

# make good and bad date csv files
# read in good dates file using parse_dates - no problem
df = pd.read_csv('dategood.csv', parse_dates=['dates'], date_parser=np.datetime64)

df.dtypes

dates    datetime64[ns]
data            float64
dtype: object

# try same code on bad dates file - throws exceptions
df = pd.read_csv('datebad.csv', parse_dates=['dates'], date_parser=np.datetime64)

ValueError: Error parsing datetime string "Q%Bte0tvk5" at position 0

# read the file first without converting dates
# then use convert objects to force conversion
df = pd.read_csv('datebad.csv')
df['cdate'] = df.dates.convert_objects(convert_dates='coerce')

# resulting new date column is a datetime64 same as good data file
df.dtype

dates            object
data            float64
cdate    datetime64[ns]
dtype: object

# the bad date has NaT in the cdate column - can clean it later
df.head()

        dates      data      cdate
0  2015-12-01  0.914836 2015-12-01
1  2015-12-02  0.866848 2015-12-02
2  2015-12-03  0.103718 2015-12-03
3  2015-12-04  0.514086 2015-12-04
4  Q%Bte0tvk5  0.583617        NaT

网友

2楼 · 发布于 2024-04-18 23:27:59

使用内置的pd.to_datetime，它将非日期类型的数据转换为NaT

pd.read_csv(
    BytesIO(raw_data),
    parse_dates=['dates'],
    date_parser=pd.to_datetime,
)

现在您可以使用标准的nan/null检查筛选出无效的行

^{pr2}$

网友

3楼 · 发布于 2024-04-18 23:27:59

somewhere in the csv that's being sent, there is a misformatted date

^{}需要ISO8601 formatted字符串才能正常工作。好消息是您可以在自己的函数中包装np.datetime64，并将其用作date_parser：

def parse_date(v):
   try:
      return np.datetime64(v)
   except:
      # apply whatever remedies you deem appropriate
      pass
   return v

   pd.read_csv(
     ...
     date_parser=parse_date
   )

I need pandas to keep and parse that other data.

我经常发现，像^{}这样更灵活的日期分析器比np.datetime64工作得更好，甚至可能不需要额外的函数：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用时跳过日期错误的行pd.读卡器

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >