Python / Mako：如何正确解析Unicode字符串/字符？

3 投票

1 回答

3813 浏览

提问于 2025-04-16 04:19

我正在尝试让Mako渲染一些包含Unicode字符的字符串：

tempLook=TemplateLookup(..., default_filters=[], input_encoding='utf8',output_encoding='utf-8', encoding_errors='replace')
...
print sys.stdout.encoding
uname=cherrypy.session['userName']
print uname
kwargs['_toshow']=uname
...
return tempLook.get_template(page).render(**kwargs)

相关的模板文件：

...${_toshow}...

输出结果是：

UTF-8
Deşghfkskhü
...
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 1: ordinal not in range(128)

我觉得字符串本身没有问题，因为我可以正常打印出来。

虽然我对input/output_encoding和default_filters这两个参数进行了很多尝试，但它总是抱怨无法用ascii编码进行解码/编码。

所以我决定试试在文档中找到的例子，结果如下：

input_encoding='utf-8', output_encoding='utf-8'
#(note : it still raised an error without output_encoding, despite tutorial not implying it)

使用

${u"voix m’a réveillé."}

结果是

voix mâ�a rÃ©veillÃ©

我就是搞不懂为什么这个不行。“魔法编码注释”也没用。所有文件都是用UTF-8编码的。

我花了好几个小时也没解决，难道我漏掉了什么吗？

~~更新：~~

我现在有个更简单的问题：

现在所有变量都是Unicode，我该如何让Mako渲染Unicode字符串而不需要做任何处理？传递一个空的过滤器或使用render_unicode()都没有帮助。

字符串处理 unicode 字符编码模板渲染 utf-8 编码错误变量传递 mako

1 个回答

没错，UTF-8 和 Unicode 是不一样的。

UTF-8 是一种特定的字符串编码方式，类似于 ASCII 和 ISO 8859-1。你可以试试这个：

对于任何输入的字符串，使用 inputstring.decode('utf-8')（或者你得到的其他输入编码）。对于任何输出的字符串，使用 outputstring.encode('utf-8')（或者你想要的其他输出编码）。在内部使用时，使用 Unicode 字符串（'this is a normal string'.decode('utf-8') == u'this is a normal string'）。

'foo' 是一个普通字符串，u'foo' 是一个 Unicode 字符串，它没有“编码”（不能被解码）。所以每当 Python 想要改变普通字符串的编码时，它首先会尝试“解码”这个字符串，然后再“编码”它。而默认的编码是“ascii”，这通常会失败很多次 :-)

回答于 2025-04-16 由 Python大师

分享举报

Python / Mako：如何正确解析Unicode字符串/字符？

1 个回答

撰写回答