python的UnicodeErrors中显示的“charmap”编解码器是什么?

2024-04-26 01:29:21 发布

您现在位置:Python中文网/ 问答频道 /正文

运行一些简单的代码以查看哪些编码可以解码特定文件,如:

encodings = ('cp737', 'cp869', 'cp875', 'cp1253', 'iso2022_jp_2', 'iso8859_7',
             'mac_greek', 'utf-8')

def test_encoding():
    with tempfile.TemporaryDirectory() as tmp_dir:
        for c in csvs:
            for encoding in encodings:
                try:
                    with open(c, 'r', encoding=encoding) as f:
                        content = f.read()
                except UnicodeDecodeError as e:
                    print(encoding, e) # <---- print from here
                    continue
                csv_out = os.path.join(tmp_dir, os.path.basename(
                    c[:-4]) + '_%s.csv' % encoding)
                with open(csv_out, 'w', encoding=encoding,
                          newline='\n') as f:
                    f.write(content)
        input('Files created in %s' % tmp_dir)

这张照片:

cp869 'charmap' codec can't decode byte 0x83 in position 28: character maps to <undefined>
cp1253 'charmap' codec can't decode byte 0x8c in position 26: character maps to <undefined>
iso2022_jp_2 'iso2022_jp_2' codec can't decode byte 0xce in position 18: illegal multibyte sequence
iso8859_7 'charmap' codec can't decode byte 0xae in position 84: character maps to <undefined>

那么什么是'charmap' codec?为什么有时它打印'charmap' codec can't...,而在另一些情况下,如iso2022_jp_2,它打印编码的名称?你知道吗

我在上

Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 07:18:10) [MSC v.1900 32 bit (Intel)]

Tags: csvinasdirwithpositionbytecan

热门问题