there are two additional normal forms based on compatibility
equivalence. In Unicode, certain characters are supported which
normally would be unified with other characters. For example, U+2160
(ROMAN NUMERAL ONE) is really the same thing as U+0049 (LATIN CAPITAL
LETTER I). However, it is supported in Unicode for compatibility with
existing character sets (e.g. gb2312).
The normal form KD (NFKD) will apply the compatibility decomposition,
i.e. replace all compatibility characters with their equivalents.
可以使用unicodedata库将unicode转换为ascii
我们将使用unicodedata.normalize的“NFKD”形式进行转换。根据unicodata documentation:
因此,解决方案是:
相关问题 更多 >
编程相关推荐