Python 和 gettext 的 UTF-8 错误

6 投票

3 回答

6217 浏览

提问于 2025-04-16 15:05

我在编辑器里使用UTF-8编码，所以这里显示的所有字符串在文件里也是UTF-8编码。

我有一个这样的Python脚本：

# -*- coding: utf-8 -*-
...
parser = optparse.OptionParser(
  description=_('automates the dice rolling in the classic game "risk"'), 
  usage=_("usage: %prog attacking defending"))

然后我用xgettext把所有内容提取出来，得到了一个.pot文件，简化后可以变成：

"Content-Type: text/plain; charset=CHARSET\n"
"Content-Transfer-Encoding: 8bit\n"

#: auto_dice.py:16
msgid "automates the dice rolling in the classic game \"risk\""
msgstr ""

之后，我用msginit生成了一个de.po文件，我是这样填写的：

"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

#: auto_dice.py:16
msgid "automates the dice rolling in the classic game \"risk\""
msgstr "automatisiert das Würfeln bei \"Risiko\""

运行这个脚本时，我遇到了以下错误：

  File "/usr/lib/python2.6/optparse.py", line 1664, in print_help
    file.write(self.format_help().encode(encoding, "replace"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 60: ordinal not in range(128)

我该怎么解决这个问题呢？

utf-8 gettext xgettext pot文件 msginit 字符编码错误

3 个回答

我对这个不太熟悉，但看起来在2.6版本中有一个已知的bug，而在2.7版本中已经修复了：

http://bugs.python.org/issue2931

如果你不能使用2.7版本，可以试试这个解决办法：

http://mail.python.org/pipermail/python-dev/2006-May/065458.html

回答于 2025-04-16 由 Python大师

分享举报

我怀疑问题出在 _("string") 返回的是字节字符串，而不是Unicode字符串。

一个明显的解决办法是：

parser = optparse.OptionParser(
        description=_('automates the dice rolling in the classic game "risk"').decode('utf-8'),
        usage=_("usage: %prog attacking defending").decode('utf-8'))

不过这样做感觉不太对。

ugettext 或者 install(True) 可能会有帮助。

Python gettext 文档给出了这些例子：

import gettext
t = gettext.translation('spam', '/usr/share/locale')
_ = t.ugettext

或者：

import gettext
gettext.install('myapplication', '/usr/share/locale', unicode=1)

我正在尝试重现你的问题，即使我使用 install(unicode=1)，我得到的仍然是字节字符串（str 类型）。

要么是我使用gettext的方式不对，要么是我的 .po/.mo 文件里缺少字符编码声明。

等我了解更多情况会再更新。

xlt = _('automates the dice rolling in the classic game "risk"')
print type(xlt)
if isinstance(xlt, str):
    print 'gettext returned a str (wrong)'
    print xlt
    print xlt.decode('utf-8').encode('utf-8')
elif isinstance(xlt, unicode):
    print 'gettext returned a unicode (right)'
    print xlt.encode('utf-8')

（还有一种可能是，在 .po 文件中使用转义字符或Unicode代码点，但这听起来不太有趣。）

（或者你可以查看你系统的 .po 文件，看看它们是如何处理非ASCII字符的。）

回答于 2025-04-16 由 Python大师

分享举报

这个错误的意思是你在一个字节串上调用了编码函数，所以它试图用系统默认的编码（在Python 2中是ascii）把它解码成Unicode，然后再用你指定的编码重新编码。

通常，解决这个问题的方法是在使用字符串之前，先调用 s.decode('utf-8')（或者根据字符串的实际编码来选择）来解码。你也可以尝试直接使用Unicode字面量，比如 u'automates...'（这取决于从.po文件中如何替换字符串，我对此不太了解）。

这种让人困惑的行为在Python 3中得到了改善，只有在你明确告诉它时，它才会尝试把字节转换成Unicode。

回答于 2025-04-16 由 Python大师

分享举报

Python 和 gettext 的 UTF-8 错误

3 个回答

撰写回答