Python UTF-16输出和Windows行结束符的bug？

2 投票

3 回答

1680 浏览

提问于 2025-04-15 13:06

这段代码是：

test.py

import sys
import codecs

sys.stdout = codecs.getwriter('utf-16')(sys.stdout)

print "test1"
print "test2"

然后我这样运行它：

test.py > test.txt

在Windows 2000上使用Python 2.6时，我发现换行符输出成了字节序列\x0D\x0A\x00，这显然对于UTF-16来说是错误的。

我是不是漏掉了什么，还是说这是个bug？

windows utf-16 byte order mark newline characters encoding issues software bugs

3 个回答

到目前为止，我找到了解决方案，但没有一个能输出带有Windows风格换行符的UTF-16格式。

首先，想要把Python的print语句输出到一个文件，并且使用UTF-16编码（输出的是Unix风格的换行符）：

import sys
import codecs

sys.stdout = codecs.open("outputfile.txt", "w", encoding="utf16")

print "test1"
print "test2"

其次，想要把输出重定向到stdout，使用UTF-16编码，并且不出现换行符转换错误（输出的是Unix风格的换行符）（感谢这个ActiveState的例子）：

import sys
import codecs

sys.stdout = codecs.getwriter('utf-16')(sys.stdout)

if sys.platform == "win32":
    import os, msvcrt
    msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)

print "test1"
print "test2"

回答于 2025-04-15 由 Python大师

分享举报

换行符的处理是在标准输出文件里进行的。你把“test1\n”写入了 sys.stdout（一个流写入器）。这个流写入器把它转换成了“t\x00e\x00s\x00t\x001\x00\n\x00”，然后发送到真正的文件，也就是原来的 sys.stderr。

那个文件并不知道你把数据转换成了 UTF-16；它只知道输出流中的任何 \n 值需要转换成 \x0D\x0A，这就是你看到的输出结果。

回答于 2025-04-15 由 Python大师

分享举报

试试这个：

import sys
import codecs

if sys.platform == "win32":
    import os, msvcrt
    msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)

class CRLFWrapper(object):
    def __init__(self, output):
        self.output = output

    def write(self, s):
        self.output.write(s.replace("\n", "\r\n"))

    def __getattr__(self, key):
        return getattr(self.output, key)

sys.stdout = CRLFWrapper(codecs.getwriter('utf-16')(sys.stdout))
print "test1"
print "test2"

回答于 2025-04-15 由 Python大师

分享举报

Python UTF-16输出和Windows行结束符的bug？

3 个回答

撰写回答