.csv-fi的Unicode解码错误

Traceback (most recent call last): File "/Users/stephensmith/Documents/Permits/deleterows.py", line 17, in <module> deleteRow(file, "output/" + file) File "/Users/stephensmith/Documents/Permits/deleterows.py", line 8, in deleteRow for row in csv.reader(input): File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/codecs.py", line 319, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/encodings/utf_8_sig.py", line 69, in _buffer_decode return codecs.utf_8_decode(input, errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa2 in position 6540: invalid start byte

2条回答

网友

1楼 · 编辑于 2024-06-16 10:24:16

也许你可以尝试循环使用csv文件，这些文件会崩溃，比如：

with open(file) as f:
    for line in f:
        print repr(line)

看看是否有可疑人物出现。在

如果您能够通过这种方式识别可疑字符，比如弹出\0Xý1，则可以通过重写和替换该字符来清理文件：

^{pr2}$

然后用清理后的文件再试一次。在

网友

2楼 · 编辑于 2024-06-16 10:24:16

这是一个编码问题。输入csv文件并不像Python平台预期的那样编码为utf-8。问题是，如果不知道它的编码，也没有一个有问题的行的例子，我真的猜不到编码。在

encoding='utf8'和encoding='ascii'都断开是正常的，因为有问题的字符是0xa2，它不在ascii范围内（<；=0x7f）不是有效的utf-8字符。但奇怪的是，encoding='latin1'在同一个地方给出了相同的错误，因为0xa2在拉丁语中是¢。在

IMHO，根据this other SO post，您可以尝试encoding='windows-1252'，如果您的平台支持它。在

如果仍然不起作用，您应该尝试识别latin1的有问题的行：

class special_opener:
    def __init__(self, filename, encoding):
        self.fd = open(filename, 'rb')
        self.encoding = encoding
    def __enter__(self):
        return self
    def __exit__(self, exc_type, exc_value, traceback):
        return False
    def __next__(self):
        line = next(self.fd)
        try:
            return line.decode(self.encoding).strip('\r\n') + '\n'
        except Exception as e:
            print("Offending line : ", line, file = sys.stderr)
            raise e
    def __iter__(self):
        return self

def deleteRow(in_fnam, out_fnam):
    input = special_opener(in_fnam, 'latin1')
    output = open(out_fnam, 'w')
    writer = csv.writer(output)
    for row in csv.reader(input):
        if any(row):
            writer.writerow(row)
    input.close()
    output.close()

special_opener应该输出如下内容：

^{pr2}$

（这行是有效的拉丁语，我是用special_opener(file, 'utf8')得到的）

那你就可以在这里贴出冒犯的字眼了

相关问题更多 >

编程相关推荐

热门问题

热门文章