将python脚本输出输出到fi时发生Unicode错误

Traceback (most recent call last): File "script.py", line 70, in <module> '"' + desc.decode('utf-8', errors='ignore') + '")' UnicodeEncodeError: 'ascii' codec can't encode character u'\xf8' in position 264 : ordinal not in range(128)

3条回答

网友

1楼 · 编辑于 2024-05-13 02:03:26

为了打印文本而将文本转换为unicode是没有意义的。以unicode格式处理数据，将其转换为某种编码以进行输出。

代码的作用是：在python 2上，所以默认的字符串类型（str）是bytestring。在您的语句中，您从一些utf编码的字节字符串开始，将它们转换为unicode，并用引号（为了组合成一个字符串而强制转换为unicode的常规str）将它们括起来。然后将这个unicode字符串传递给print，后者将其推送到sys.stdout。为此，需要将其转换为字节。如果您正在向Windows控制台写入数据，它可以以某种方式进行协商，但是如果您重定向到一个普通的哑文件，它就会返回到ascii并抱怨，因为这样做的方法是没有损失的。

解决方案：不要给print一个unicode字符串。“将“it yourself”编码为您选择的表示：

print "Latin-1:", "unicode über alles!".decode('utf-8').encode('latin-1')
print "Utf-8:", "unicode über alles!".decode('utf-8').encode('utf-8')
print "Windows:", "unicode über alles!".decode('utf-8').encode('cp1252')

所有这些都应该在重定向时毫无怨言地工作。它可能不会显示在屏幕上，但是用记事本或其他东西打开输出文件，看看编辑器是否设置为查看格式。（只有Utf-8有被发现的希望。cp1252可能是Windows的默认值）。

一旦你搞定了，清理你的代码，避免使用打印文件输出。使用codecs模块，用codecs.open打开文件，而不是直接打开。

另外，如果您正在解码一个utf-8字符串，那么到unicode的转换应该不会丢失：您不需要errors=ignore标志。当您转换为ascii或拉丁语-2或其他类型时，这是合适的，并且您只想删除目标代码页中不存在的字符。

网友

2楼 · 编辑于 2024-05-13 02:03:26

在这种情况下，Windows的行为有点复杂。您应该听取其他建议，并在内部对字符串使用unicode，并在输入期间进行解码。

对于您的问题，您需要打印编码字符串（只有您知道是哪种编码！）在stdout重定向的情况下，但是在简单屏幕输出的情况下，您必须打印unicode字符串（python或windows控制台处理到正确编码的转换）。

我建议你的脚本结构如下：

# -*- coding: utf-8 -*- 
import sys, codecs
# set up output encoding
if not sys.stdout.isatty():
    # here you can set encoding for your 'out.txt' file
    sys.stdout = codecs.getwriter('utf8')(sys.stdout)

# next, you will print all strings in unicode
print u"Unicode string ěščřžý"

更新：另请参见其他类似问题：Setting the correct encoding when piping stdout in Python

网友

3楼 · 编辑于 2024-05-13 02:03:26

您可以使用codecs模块将unicode数据写入文件

import codecs
file = codecs.open("out.txt", "w", "utf-8")
file.write(something)

“print”输出到standart输出，如果您的控制台不支持utf-8，即使您将stdout管道传输到一个文件，它也可能导致这样的错误。

相关问题更多 >

编程相关推荐

热门问题

热门文章