在Python中将cp850映射到unicode

1条回答

网友

1楼 · 发布于 2024-04-26 23:30:19

Is it possible to map cp850 to unicode in python?

当然，只需解码数据的字节（Python 3示例）：

>>> s=b'\xcdABCDEF\xcd\xdbHIJKLMNOP'.decode('cp850')
>>> s
'═ABCDEF═█HIJKLMNOP'

do i have to map the codes myself?

只是你需要翻译的字节。对于Unicode字符串，有一个方便的.translate方法，它接受映射字典：

^{pr2}$

完成后，将输出编码为UTF-8：

>>> s.translate(D).encode('utf8')
b'\xe2\x95\x90ABCDEF\xe2\x95\x90\nHIJKLMNOP'

关键是在读取数据时解码为Unicode，用Unicode进行所有处理，然后在发送数据到存储器时将其编码回数据。例如，对于文件：

with open('out.txt','w',encoding='utf8') as f:
    f.write(s)