如何让Popen()正确处理UTF-8？

6 投票

3 回答

10882 浏览

提问于 2025-04-16 05:26

这是我用Python写的代码：

[...]
proc = Popen(path, stdin=stdin, stdout=PIPE, stderr=PIPE)
result = [x for x in proc.stdout.readlines()]
result = ''.join(result);

当我处理ASCII字符时，一切都运行得很好。但是当我在stdout中接收到UTF-8文本时，结果就变得不可预测了。在大多数情况下，输出的内容都出现了问题。这里到底出了什么错呢？

顺便问一下，这段代码有没有什么可以优化的地方？

3 个回答

我在使用 LogPipe 的时候遇到了同样的问题。

我通过给 fdopen() 加了一些额外的参数 encoding='utf-8', errors='ignore' 来解决这个问题。

# https://codereview.stackexchange.com/questions/6567/redirecting-subprocesses-output-stdout-and-stderr-to-the-logging-module
class LogPipe(threading.Thread):
    def __init__(self):
        """Setup the object with a logger and a loglevel
        and start the thread
        """
        threading.Thread.__init__(self)
        self.daemon = False
        # self.level = level
        self.fdRead, self.fdWrite = os.pipe()
        self.pipeReader = os.fdopen(self.fdRead, encoding='utf-8', errors='ignore')  # set utf-8 encoding and just ignore illegal character
        self.start()

    def fileno(self):
        """Return the write file descriptor of the pipe
        """
        return self.fdWrite

    def run(self):
        """Run the thread, logging everything.
        """
        for line in iter(self.pipeReader.readline, ''):
            # vlogger.log(self.level, line.strip('\n'))
            vlogger.debug(line.strip('\n'))

        self.pipeReader.close()

    def close(self):
        """Close the write end of the pipe.
        """
        os.close(self.fdWrite)

回答于 2025-04-16 由 Python大师

分享举报

简短回答

设置环境变量 PYTHONIOENCODING，并在 Popen 中设置编码：

#tst1.py
import subprocess
import sys, os

#print(sys.stdout.encoding)      #output: utf-8  this default for interactive console
os.environ['PYTHONIOENCODING'] =  'utf-8'
p = subprocess.Popen(['python', 'tst2.py'], encoding='utf-8', stdout=subprocess.PIPE, stderr=subprocess.PIPE)
#print(p.stdout)                                        #output: <_io.TextIOWrapper name=3 encoding='utf-8'>
#print(p.stdout.encoding, '  ', p.stderr.encoding)       #ouput: utf-8    utf-8
outs, errors = p.communicate()
print(outs, errors)

这里的 tst1.py 是用来运行另一个 Python 脚本 tst2.py，像这样：

#tst2.py
import sys

print(sys.stdout.encoding)      #output: utf-8
print('\u2e85')  #a chinese char

详细回答

使用 PIPE 表示要打开一个标准流的管道。管道是一种单向的数据通道，可以用来进行进程间的通信。管道处理的是二进制数据，对编码没有偏好。如果管道两边的应用程序要处理文本数据，它们需要对文本编码达成一致（可以阅读更多信息）。

所以首先，tst2.py 的 stdout（标准输出）应该使用 utf-8 编码，否则会报错：

UnicodeEncodeError: 'charmap' codec can't encode character '\u2e85' in position 0: character maps to <undefined>

流 sys.stdout 和 sys.stderr 就像用 open() 函数打开的普通文本文件。在 Windows 系统中，像管道和磁盘文件这样的非字符设备使用系统的区域设置编码（也就是像 CP1252 这样的 ANSI 代码页）。在所有平台上，你可以通过在运行解释器之前设置 PYTHONIOENCODING 环境变量来覆盖字符编码。

其次，tst1.py 需要知道如何从管道读取数据，因此在 Popen 中需要设置 encoding='utf-8'。

更多细节

在 Python 3.6 及以上版本中，根据 PEP 528，Windows 中的 交互式控制台 默认编码是 utf-8（可以通过同时设置 PYTHONIOENCODING 和 PYTHONLEGACYWINDOWSSTDIO 来更改）。但这不适用于管道和重定向。

回答于 2025-04-16 由 Python大师

分享举报

你有没有试过先解码你的字符串，然后把你的UTF-8字符串合并在一起呢？在Python 2.4及以上版本，这可以通过下面的方式实现：

result = [x.decode('utf8') for x in proc.stdout.readlines()]

这里有个重要的点，就是你的变量 x 是一串字节，这些字节需要被理解为字符。decode() 方法就是用来进行这种理解的（在这里，假设这些字节是用UTF-8编码的）：x.decode('utf8') 的结果是 unicode 类型，你可以把它想象成“字符的字符串”（这和“0到255之间的数字字符串[字节]”是不同的）。

回答于 2025-04-16 由 Python大师

分享举报

如何让Popen()正确处理UTF-8？

3 个回答

简短回答

详细回答

更多细节

撰写回答