表情符号，当文本文件包含utf8和utf16时进行编码/解码

.... {"emojiCharts":{"emoji_icon":"\u2697","repost": 3, "doc": 3, "engagement": 1184, "reach": 6734, "impression": 44898}} {"emojiCharts":{"emoji_icon":"\U0001f924","repost": 11, "doc": 11, "engagement": 83, "reach": 1047, "impression": 6981}} ....

with open(OUTPUT, "r") as infileInsight: insightData = infileInsight.read()\ .decode('raw_unicode_escape') with open(OUTPUT, "w+") as outfileInsight: outfileInsight.write(insightData.encode('utf-8'))

3条回答

网友

1楼 · 编辑于 2024-04-26 13:10:47

好吧。Python2.7，赢10。在

原始文件是纯ASCII格式的，包含精确的unicode转义符（“\u###############################。在

读取文件并使用“unicode escape”进行解码：那么就有了一个Python unicode字符串；我们将其命名为your_unicode_string。在

要写入文件，请选择：

output_encoding = 'utf-8'

或者

^{pr2}$

然后：

import codecs
with codecs.open(output_filename, 'w', encoding=output_encoding) as fpo:
    # fpo.write(u'\ufeff') # for windows, you might want to write this at the start
    fpo.write(your_unicode_string)

对于给定的python和os版本，在没有任何篡改的情况下，您将无法在控制台上看到emojis。在

网友

2楼 · 编辑于 2024-04-26 13:10:47

你可以这么做。在

print a["emojiCharts"]["emoji_icon"].decode("unicode-escape")

输出： ⚗

网友

3楼 · 编辑于 2024-04-26 13:10:47

这与UTF-8或UTF-16无关。一般来说，这只是Python转义Unicode字符的方法，U+FFFF以下的所有字符都使用\uFFFF，上面的所有内容都使用\UFFFFFFFF（出于历史原因）。在

这两个转义序列在Python字符串中的工作方式应该完全相同。在我的机器上，使用@vks的解决方案：

$ python
Python 2.7.15rc1 (default, Apr 15 2018, 21:51:34)
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> '\U0000ABCD'.decode('unicode-escape')
u'\uabcd'
>>> '\uABCD'.decode('unicode-escape')
u'\uabcd'

与python3类似。在

相关问题更多 >

编程相关推荐

热门问题

热门文章