Python读取包含EOF和空字节的utf8 csv文件

2024-04-22 20:31:36 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个utf-8编码的文件,包含EOF和空字节

了解将EOF读入数据帧的解决方案是使用engine='python',而读取空字节的解决方案是使用engine='c',我应该如何解决这个问题

谢谢大家!

编辑:

执行以下代码:

pd.read_csv('extract.csv', sep = ",", encoding='utf-8', quotechar='"', engine='python')

收到此错误:

pandas.errors.ParserError: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead

修正如下:

pd.read_csv('extract.csv', sep = ",", encoding='utf-8', quotechar='"', engine='c')

收到此错误:

pandas.errors.ParserError: Error tokenizing data. C error: EOF inside string starting at line 0


Tags: csvpandasread字节错误extract解决方案engine
1条回答
网友
1楼 · 发布于 2024-04-22 20:31:36

您的文件未正确写入,并且以nul\0字符开头

>>> import pandas as pd
>>> with open("extract.csv", 'wb') as fh:
...   fh.write(b"\0")
...
1
>>> pd.read_csv('extract.csv', sep = ",", encoding='utf-8', quotechar='"', engine='python')
Traceback (most recent call last):
[omitted]
_csv.Error: line contains NULL byte
During handling of the above exception, another exception occurred:
[omitted]
pandas.errors.ParserError: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead

如果您的文件看起来是空的/非常小,则可能是写得不正确,只是缺少内容

如果您的文件确实有内容,您可以通过稍微深入一点(fh.seek()方法)来解决问题

with open("extract.csv", encoding="utf-8") as fh:
    fh.seek(1)  # step forward into the file before attempting to read it
    pd.read_csv(fh, sep = ",", encoding='utf-8', quotechar='"', engine='python')

文件有一个指针,指示您读取它们的位置。通过稍微深入一点,您可以跳过nul字节,然后再将其传递给pandas(应使用any file-like object with a ^{} method

如果您的文件为空,则会出现另一个错误

pandas.errors.EmptyDataError: No columns to parse from file

相关问题 更多 >