text = '''This was heart wrenching \u2764\ufe0f
Amazing compassion \ud83d\udc9c\ud83d\udc9c\ud83d\udc9c #tears
\u2764\ufe0f\u2764\ufe0f\u2764\ufe0f'''
print(text.encode('ascii', 'namereplace').decode())
结果:
This was heart wrenching \N{HEAVY BLACK HEART}\N{VARIATION SELECTOR-16}
Amazing compassion \ud83d\udc9c\ud83d\udc9c\ud83d\udc9c #tears
\N{HEAVY BLACK HEART}\N{VARIATION SELECTOR-16}\N{HEAVY BLACK HEART}\N{VARIATION SELECTOR-16}\N{HEAVY BLACK HEART}\N{VARIATION SELECTOR-16}
\N{THUMBS UP SIGN}
import unicodedata
# http://www.unicode.org/reports/tr44/#General_Category_Values
for char in text:
try:
print(char, '|', unicodedata.category(char), '|', unicodedata.name(char))
except ValueError:
print(repr(char), '| (repr)')
结果:
T | Lu | LATIN CAPITAL LETTER T
h | Ll | LATIN SMALL LETTER H
i | Ll | LATIN SMALL LETTER I
s | Ll | LATIN SMALL LETTER S
| Zs | SPACE
w | Ll | LATIN SMALL LETTER W
a | Ll | LATIN SMALL LETTER A
s | Ll | LATIN SMALL LETTER S
| Zs | SPACE
h | Ll | LATIN SMALL LETTER H
e | Ll | LATIN SMALL LETTER E
a | Ll | LATIN SMALL LETTER A
r | Ll | LATIN SMALL LETTER R
t | Ll | LATIN SMALL LETTER T
| Zs | SPACE
w | Ll | LATIN SMALL LETTER W
r | Ll | LATIN SMALL LETTER R
e | Ll | LATIN SMALL LETTER E
n | Ll | LATIN SMALL LETTER N
c | Ll | LATIN SMALL LETTER C
h | Ll | LATIN SMALL LETTER H
i | Ll | LATIN SMALL LETTER I
n | Ll | LATIN SMALL LETTER N
g | Ll | LATIN SMALL LETTER G
| Zs | SPACE
❤ | So | HEAVY BLACK HEART
️ | Mn | VARIATION SELECTOR-16
'\n' | (repr)
A | Lu | LATIN CAPITAL LETTER A
m | Ll | LATIN SMALL LETTER M
a | Ll | LATIN SMALL LETTER A
z | Ll | LATIN SMALL LETTER Z
i | Ll | LATIN SMALL LETTER I
n | Ll | LATIN SMALL LETTER N
g | Ll | LATIN SMALL LETTER G
| Zs | SPACE
c | Ll | LATIN SMALL LETTER C
o | Ll | LATIN SMALL LETTER O
m | Ll | LATIN SMALL LETTER M
p | Ll | LATIN SMALL LETTER P
a | Ll | LATIN SMALL LETTER A
s | Ll | LATIN SMALL LETTER S
s | Ll | LATIN SMALL LETTER S
i | Ll | LATIN SMALL LETTER I
o | Ll | LATIN SMALL LETTER O
n | Ll | LATIN SMALL LETTER N
| Zs | SPACE
'\ud83d' | (repr)
'\udc9c' | (repr)
'\ud83d' | (repr)
'\udc9c' | (repr)
'\ud83d' | (repr)
'\udc9c' | (repr)
| Zs | SPACE
# | Po | NUMBER SIGN
t | Ll | LATIN SMALL LETTER T
e | Ll | LATIN SMALL LETTER E
a | Ll | LATIN SMALL LETTER A
r | Ll | LATIN SMALL LETTER R
s | Ll | LATIN SMALL LETTER S
'\n' | (repr)
❤ | So | HEAVY BLACK HEART
️ | Mn | VARIATION SELECTOR-16
❤ | So | HEAVY BLACK HEART
️ | Mn | VARIATION SELECTOR-16
❤ | So | HEAVY BLACK HEART
️ | Mn | VARIATION SELECTOR-16
import unicodedata
text = '''This was heart wrenching \u2764\ufe0f
Amazing compassion \ud83d\udc9c\ud83d\udc9c\ud83d\udc9c #tears
\u2764\ufe0f\u2764\ufe0f\u2764\ufe0f'''
result = []
for char in text:
if unicodedata.category(char) in ('So', 'Mn'):
result.append(':{}:'.format(unicodedata.name(char)))
elif unicodedata.category(char) in ('Cs'):
result.append('?') #char)
else:
result.append(char)
print(''.join(result))
结果:
This was heart wrenching :HEAVY BLACK HEART::VARIATION SELECTOR-16:
Amazing compassion ?????? #tears
:HEAVY BLACK HEART::VARIATION SELECTOR-16::HEAVY BLACK HEART::VARIATION SELECTOR-16::HEAVY BLACK HEART::VARIATION SELECTOR-16:
编辑:我想我找到了代码问题的解决方案
\ud83d\udc9c
它将代理项值
\ud83d\udc9c
转换为正确的表情符号值\U0001f49c
资料来源:How to work with surrogate pairs in Python?
维基百科:Surrogate
其他:Unicode character inspector
使用谷歌我发现
结果
及
结果:
因此,在询问Stackoverflow之前,最好使用
Google
https://docs.python.org/3/howto/unicode.html
文本也一样
结果:
现在您可能需要删除
\N{
和}
但是它在
\ud83d\udc9c\ud83d\udc9c\ud83d\udc9c
方面有问题您可以在
for
-loop中使用unicodedata
来获取文本中每个字符的名称,但如果它没有名称,即'\n'
,则可能会出现问题。它还提供了普通字符的名称,因此您可能必须使用unicodedata.category()
来决定要替换哪些字符这也与
\ud83d\udc9c\ud83d\udc9c\ud83d\udc9c
有关结果:
因为它与
\ud83d\udc9c\ud83d\udc9c\ud83d\udc9c
有问题,所以我用?
替换它结果:
编辑:再次使用Google,我发现外部模块emoji可以转换一些名称,但它也有问题
\ud83d\udc9c
,所以我使用repr
来显示它,但它也将新行打印为\n
结果:
http://www.unicode.org/emoji/charts/full-emoji-list.html
https://www.webfx.com/tools/emoji-cheat-sheet/
http://unicode.org/Public/emoji/12.0/emoji-test.txt
顺便说一句:我找到了模块demoji,它可以找到表情符号并给出名称。但它也有代码
\ud83d\udc9c
的问题安装模块后,只需
demoji.download_codes()
一次结果:
如果您将其作为JSON数据
"\ud83d\udc9c"
获取,那么您应该不会有问题-它应该自动转换它在其他情况下,您必须转换它
How to work with surrogate pairs in Python?
相关问题 更多 >
编程相关推荐