Python：用Unicoderepresentation替换字符

网友

1楼 · 编辑于 2024-04-25 04:59:24

第一步是将字节字符串转换为Unicode字符串：

u = s.decode('utf-8')

第二步是创建一个新字符串，每个字符都被其Unicode转义序列替换。在

^{pr2}$

如果您只想替换非ASCII字符，则稍作修改即可：

new = ''.join(c if 32 <= ord(c) < 128 else '\\u{:04x}'.format(ord(c)) for c in u)

请注意，\u0000符号仅适用于基本Unicode平面中的Unicode码位。您需要\U00000000符号来表示更大的内容。您还可以使用\x00表示法来表示小于256的值。以下内容处理所有情况，可能更容易阅读：

def unicode_notation(c):
    x = ord(c)
    if 32 <= x < 128:
        return c
    if x < 256:
        return '\\x{:02x}'.format(x)
    if x < 0x10000:
        return '\\u{:04x}'.format(x)
    return '\\U{:08x}'.format(x)

new = ''.join(unicode_notation(c) for c in u)

网友

2楼 · 编辑于 2024-04-25 04:59:24

不知道你说的“unicode表示法”是什么意思。以下是我对你问题的解释。在

对于Python 3：

print('\\u' + hex(ord('ä'))[2:].upper().rjust(4, '0'))

对于Python 2：

^{pr2}$

网友

3楼 · 编辑于 2024-04-25 04:59:24

我想这就是你想要的？在

>>> new = s.decode('utf-8')

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python：用Unicoderepresentation替换字符

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >