使用标准库将表情符号解码为两个（或更多）代码点

3条回答

网友

1楼 · 编辑于 2024-05-29 08:31:23

{a1}和{a2}的组合可以满足您的需求

>>> import struct
>>> b = to_decode.encode('utf_32_le')
>>> count = len(b) // 4
>>> count
2
>>> cp = struct.unpack('<%dI' % count, b)
>>> [hex(x) for x in cp]
['0x1f1f2', '0x1f1e9']

网友

2楼 · 编辑于 2024-05-29 08:31:23

这是一种黑客行为，但您可以使用unicode字符串的repr：

>>> repr(to_decode)
"u'\\U0001f1f2\\U0001f1e9'"

因此：

>>> hex(int(repr(to_decode)[4:12], 16))
'0x1f1f2'

及

>>> hex(int(repr(to_decode)[14:22], 16))
'0x1f1e9'

必须扩展此方法以支持具有两个以上代码点的emojis。您可以考虑使用上面的组合与{{CD2}}。

网友

3楼 · 编辑于 2024-05-29 08:31:23

对于这个问题，实际上需要list()，它将Unicode字符分解为其组成代码点

to_decode = u'🇲🇩'
list(to_decode)
['🇲', '🇩']

作为一个例子，我创建了一个孟加拉字母表的unicode可视化

https://www.kaggle.com/jamesmcguigan/unicode-visualization-of-the-bengali-alphabet

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用标准库将表情符号解码为两个（或更多）代码点

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >