如何将包含斯堪的纳维亚字符的UTF字符串转换为ASCII?

2 投票

5 回答

6668 浏览

提问于 2025-04-15 20:51

我想把这个字符串

foo_utf = u'nästy chäräctörs with å and co.' # unicode

转换成这个

foo_ascii = 'nästy chäräctörs with å and co.' # ASCII

。

有没有人知道怎么在Python（2.6）中做到这一点？我找到了一些关于unicodedata模块的资料，但我不知道怎么进行转换。

字符编码 utf-8 字符串转换 ascii unicodedata 斯堪的纳维亚字符

5 个回答

在Python的标准库中，有几个选项可以在codecs模块里使用，这些选项可以根据你想要如何处理扩展字符来选择：

>>> import codecs
>>> u = u'nästy chäräctörs with å and co.'
>>> encode = codecs.get_encoder('ascii')
>>> encode(u) 
'
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 1: ordinal not in range(128)
>>> encode(u, 'ignore')
('nsty chrctrs with  and co.', 31)
>>> encode(u, 'replace')
('n?sty ch?r?ct?rs with ? and co.', 31)
>>> encode(u, 'xmlcharrefreplace')
('n&#228;sty ch&#228;r&#228;ct&#246;rs with &#229; and co.', 31)
>>> encode(u, 'backslashreplace')
('n\\xe4sty ch\\xe4r\\xe4ct\\xf6rs with \\xe5 and co.', 31)

希望其中一个选项能满足你的需求。你可以在Python codecs模块的文档中找到更多信息。

回答于 2025-04-15 由 Python大师

分享举报

我觉得你可能做不到。那些“讨厌的字符”不能用ASCII编码，所以你得选择其他的编码方式，比如UTF-8、Latin-1或者Windows-1252之类的。

回答于 2025-04-15 由 Python大师

分享举报

这个问题其实是关于Django的，不是关于Python的。

如果你的字符串是在某个.py文件里，确保在文件的最上面加上这一行：

-*- coding: utf-8 -*-

而且，你的字符串需要是“unicode”类型的，比如写成这样：u'foobar'

接着，确保你的HTML页面也支持unicode：

<meta http-equiv="content-type" content="text/html;charset=utf-8" />

这样就可以了。没有必要进行什么编码或解码，只要确保所有内容都是unicode格式，你就安全了。

回答于 2025-04-15 由 Python大师

分享举报

如何将包含斯堪的纳维亚字符的UTF字符串转换为ASCII?

5 个回答

撰写回答