如何使用Python反转Unicode分解？

网友

1楼 · 编辑于 2024-05-12 19:13:32

我真的不能给你一个确切的答案，因为我从来没有试过。但是在标准库中有一个unicodedata module。它有两个函数decomposition()和normalize()，可能对您有帮助。在

编辑：确保它真的被分解成unicode。有时有一些奇怪的方法来编写无法直接用编码表示的字符。比如"a，它是指由人类或某个专门程序在精神上解析为ä。在

网友

2楼 · 编辑于 2024-05-12 19:13:32

Unfortunately it seems I actually have (for example) \u00B8 (cedilla) instead of \u0327 (combining cedilla) in my text.

呃，真恶心！您仍然可以自动执行此操作，尽管该过程不会完全无损，因为它涉及兼容性分解（NFKD）。在

将U+00B8规范化为NFKD，您将得到一个空格，后跟U+0327。然后，您可以扫描字符串，查找空格的大小写，然后再组合字符，然后删除空格。最后重新组合到NFC，将组合字符放在前一个字符上。在

s= unicodedata.normalize('NFKD', s)
s= ''.join(c for i, c in enumerate(s) if c!=' ' or unicodedata.combining(s[i+1])==0)
s= unicodedata.normalize('NFC', s)

网友

3楼 · 编辑于 2024-05-12 19:13:32

我想你是在找这个：

>>> import unicodedata    
>>> print unicodedata.normalize("NFC",u"c\u0327")
ç

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何使用Python反转Unicode分解？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >