Python 3.8：将非ASCI字符转义为unicode

2条回答

网友

1楼 · 编辑于 2024-04-23 17:27:04

你可以这样做：

charList=[]
s1 = "Bürgerhaus"

for i in [ord(x) for x in s1]:
    # Keep ascii characters, unicode characters 'encoded' as their ordinal in hex
    if i < 128:  # not sure if that is right or can be made easier!
        charList.append(chr(i))
    else:
        charList.append('\\u%04x' % i )

res = ''.join(charList)
print(f"Mixed up sting: {res}")

for myStr in (res, s1):
    if '\\u' in myStr:
        print(myStr.encode().decode('unicode-escape'))
    else:
        print(myStr)

输出：

Mixed up sting: B\u00fcrgerhaus
Bürgerhaus
Bürgerhaus

说明：

我们将把每个字符转换为它对应的Unicode代码点

print([(c, ord(c)) for c in s1])
[('B', 66), ('ü', 252), ('r', 114), ('g', 103), ('e', 101), ('r', 114), ('h', 104), ('a', 97), ('u', 117), ('s', 115)]

常规ASCII字符十进制值为<；128，更大的值，如欧元符号，德国货币。。。获取值>；=128（详细表here）

现在，我们将对所有字符进行“编码”>；=128及其相应的unicode表示形式

网友

2楼 · 编辑于 2024-04-23 17:27:04

只能通过测试环（bytes）将decode()转换为[unicode]字符串，反之，encode()[unicode]字符串转换为bytes

因此，如果你想解码一个用unicode-escape转义的字符串，你需要首先将（encode()）它转换成bytestring，例如，使用你在问题中写的latin1

>>> encoded_str = 'B\\xfcrgerhaus'
>>> encoded = encoded_str.encode('latin-1')
>>> encoded
b'B\\xfcrgerhaus'
>>> encoded.decode('unicode-escape')
'Bürgerhaus'
>>> _.encode('unicode-escape')
b'B\\xfcrgerhaus'
>>> _ == encoded
True

另见：how do I .decode('string-escape') in Python3?

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python 3.8：将非ASCI字符转义为unicode

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >