使用unicode_literals在Python 2.6时有什么注意事项吗？

102 投票

6 回答

25672 浏览

提问于 2025-04-15 11:20

我们已经让我们的代码在Python 2.6上运行起来了。为了准备升级到Python 3.0，我们开始在：

from __future__ import unicode_literals

我们的.py文件中添加这些内容（在我们修改它们的时候）。我在想有没有其他人也在做这个，并且遇到了一些不明显的问题（可能是在花了很多时间调试之后发现的）。

python 3.0 代码迁移编码问题 Python 2.6 unicode_literals

6 个回答

我发现如果你在代码里加上了 unicode_literals 这个指令，你还需要在你的 .py 文件的第一行或第二行加上类似下面的内容：

 # -*- coding: utf-8

否则像下面这样的代码：

 foo = "barré"

就会导致出现这样的错误：

SyntaxError: Non-ASCII character '\xc3' in file mumble.py on line 198,
 but no encoding declared; see http://www.python.org/peps/pep-0263.html 
 for details

回答于 2025-04-15 由 Python大师

分享举报

在Python 2.6版本中（在python 2.6.5 RC1+之前），unicode字符串和关键字参数不太兼容（具体可以查看这个问题）。

比如，下面这段代码在没有使用unicode_literals的情况下可以正常运行，但如果使用了unicode_literals，就会出现一个错误，提示：关键字必须是字符串。

  >>> def foo(a=None): pass
  ...
  >>> foo(**{'a':1})
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
      TypeError: foo() keywords must be strings

回答于 2025-04-15 由 Python大师

分享举报

101

我在处理unicode字符串时遇到的问题，主要是因为把utf-8编码的字符串和unicode字符串混在一起了。

比如，看看下面这两个脚本。

two.py

# encoding: utf-8
name = 'helló wörld from two'

one.py

# encoding: utf-8
from __future__ import unicode_literals
import two
name = 'helló wörld from one'
print name + two.name

运行python one.py的输出是：

Traceback (most recent call last):
  File "one.py", line 5, in <module>
    print name + two.name
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)

在这个例子中，two.name是一个utf-8编码的字符串（不是unicode），因为它没有导入unicode_literals，而one.name是一个unicode字符串。当你把这两种字符串混在一起时，python会尝试解码那个编码的字符串（假设它是ascii），然后转换成unicode，但会失败。如果你这样做就可以了：print name + two.name.decode('utf-8')。

同样的情况也会发生，如果你编码了一个字符串，然后想在后面混用它们。比如，这样是可以的：

# encoding: utf-8
html = '<html><body>helló wörld</body></html>'
if isinstance(html, unicode):
    html = html.encode('utf-8')
print 'DEBUG: %s' % html

输出：

DEBUG: <html><body>helló wörld</body></html>

但是在加上import unicode_literals之后就不行了：

# encoding: utf-8
from __future__ import unicode_literals
html = '<html><body>helló wörld</body></html>'
if isinstance(html, unicode):
    html = html.encode('utf-8')
print 'DEBUG: %s' % html

输出：

Traceback (most recent call last):
  File "test.py", line 6, in <module>
    print 'DEBUG: %s' % html
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 16: ordinal not in range(128)

失败的原因是'DEBUG: %s'是一个unicode字符串，因此python尝试解码html。修复这个打印的方法有几种，可以用print str('DEBUG: %s') % html或者print 'DEBUG: %s' % html.decode('utf-8')。

希望这些能帮助你理解使用unicode字符串时可能遇到的问题。

回答于 2025-04-15 由 Python大师

分享举报

使用unicode_literals在Python 2.6时有什么注意事项吗？

6 个回答

撰写回答