在python中将拉丁字符串转换为unicode

l = ["Holding it Together", "Fowler RV Trip", "S\u00e9n\u00e9gal - Mali - Niger","H\u00eatres et \u00e9tang", "Coll\u00e8ge marsan","N\u00b0one", "Lines through the days 1 (Arabic) \u0633\u0637\u0648\u0631 \u0639\u0628\u0631 \u0627\u0644\u0623\u064a\u0627\u0645 1", "\u00cdndia, Tail\u00e2ndia & Cingapura"]

2条回答

网友

1楼 · 编辑于 2024-06-16 10:25:19

i want to convert that and store the strings in the list with their original names like below

当您序列化为JSON时，可能有一个标志允许您关闭非ASCII字符到\u序列的转义。如果您使用的是标准库json模块，它是ensure_ascii：

>>> print json.dumps(u'Índia')
"\u00cdndia"
>>> print json.dumps(u'Índia', ensure_ascii= False)
"Índia"

但是，请注意，去掉安全措施后，您现在必须能够正确地处理非ASCII字符，否则您将得到一堆UnicodeErrors。例如，如果您正在将JSON写入文件，则必须将Unicode字符串显式编码为所需的字符集（例如UTF-8）。在

^{pr2}$

网友

2楼 · 编辑于 2024-06-16 10:25:19

您有包含unicode转义符的字节字符串。您可以使用unicode_escape编解码器将它们转换为unicode：

>>> print "H\u00eatres et \u00e9tang".decode("unicode_escape")
Hêtres et étang

你可以把它编码回字节串：

^{pr2}$

您可以过滤和解码非unicode字符串，如：

for s in l: 
    if not isinstance(s, unicode): 
        print s.decode('unicode_escape')

相关问题更多 >

编程相关推荐

热门问题

热门文章