Python Unicode 编码错误

116 投票

9 回答

293126 浏览

提问于 2025-04-16 01:08

我正在读取和解析一个亚马逊的XML文件，文件里有一个'，但是当我尝试打印出来时，出现了以下错误：

'ascii' codec can't encode character u'\u2019' in position 16: ordinal not in range(128)

根据我在网上看到的，错误的原因是这个XML文件是用UTF-8编码的，而Python却想把它当作ASCII编码来处理。有没有简单的方法可以解决这个错误，让我的程序在读取时正常打印出XML内容呢？

数据处理 utf-8 xml解析编码错误 unicode编码

9 个回答

不要在你的脚本里死写环境的字符编码；直接打印Unicode文本就可以了：

assert isinstance(text, unicode) # or str on Python 3
print(text)

如果你的输出是重定向到一个文件（或者管道）的话，你可以使用 PYTHONIOENCODING 这个环境变量来指定字符编码：

$ PYTHONIOENCODING=utf-8 python your_script.py >output.utf8

否则，直接运行 python your_script.py 应该就没问题——你的地区设置会用来编码文本（在POSIX系统上，可以检查 LC_ALL、LC_CTYPE、LANG 这些环境变量，如果需要的话，把 LANG 设置为utf-8的地区）。

如果你想在Windows上打印Unicode，可以查看这个回答，它展示了如何在Windows控制台、文件中或使用IDLE打印Unicode。

回答于 2025-04-16 由 Python大师

分享举报

一个更好的解决方案：

if type(value) == str:
    # Ignore errors even if the string is not proper UTF-8 or has
    # broken marker bytes.
    # Python built-in function unicode() can do this.
    value = unicode(value, "utf-8", errors="ignore")
else:
    # Assume the value object has proper __unicode__() method
    value = unicode(value)

如果你想了解更多原因，可以阅读这里：

http://docs.plone.org/manage/troubleshooting/unicode.html#id1

回答于 2025-04-16 由 Python大师

分享举报

204

很可能，你的问题是你已经成功解析了内容，但现在在打印XML内容时遇到了问题，因为里面有一些外来的Unicode字符。你可以先尝试把你的Unicode字符串转换成ASCII格式：

unicodeData.encode('ascii', 'ignore')

这里的'ignore'部分会告诉程序跳过那些字符。根据Python的文档：

>>> # Python 2: u = unichr(40960) + u'abcd' + unichr(1972)
>>> u = chr(40960) + u'abcd' + chr(1972)
>>> u.encode('utf-8')
'\xea\x80\x80abcd\xde\xb4'
>>> u.encode('ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character '\ua000' in position 0: ordinal not in range(128)
>>> u.encode('ascii', 'ignore')
'abcd'
>>> u.encode('ascii', 'replace')
'?abcd?'
>>> u.encode('ascii', 'xmlcharrefreplace')
'&#40960;abcd&#1972;'

你可能想看看这篇文章：http://www.joelonsoftware.com/articles/Unicode.html，我觉得这篇文章对理解这些内容非常有帮助，算是一个基础教程。看完之后，你就不会再觉得自己在猜测该用什么命令了（至少我当时是这样的）。

回答于 2025-04-16 由 Python大师

分享举报

Python Unicode 编码错误

9 个回答

撰写回答