如何在将unicode字符写入json文件后仍保留其原始值?

2024-04-29 19:36:06 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在处理一个包含unicode表情符号的文件。它看起来很好,但保持原样。我能看到表情符号。但当我使用json模块阅读并再次写入时,它会将表情转换为类似这样的内容:“\ud83d\ude00”。所以我的表情符号“😀" 写入后变为“\ud83d\ude00”。我正在使用以下代码:

import json

with open("emoji-by-category.json", encoding='utf-8', errors='ignore') as json_data:
    data = json.load(json_data, strict=False)

with open("emoji-by-category2.json", 'w', encoding='utf-8', errors='ignore') as json_file:
    json.dump(data, json_file, indent=4)

以下是json文件示例:

[
    {
        "category": "Smileys & Emotion",
        "section": "face-smiling",
        "n": "1",
        "code": "U+1F600",
        "text": "\ud83d\ude00",
        "recentlyAdded": false,
        "name": "grinning face",
        "vendors": {
            "Appl": true,
            "Goog": true,
            "FB": true,
            "Wind": true,
            "Twtr": true,
            "Joy": true,
            "Sams": true,
            "GMail": true,
            "SB": false,
            "DCM": false,
            "KDDI": false,
            "Tlgr": true
        },
        "tags": [
            "face",
            "grin",
            "grinning face"
        ],
        "keywords": [
            "face",
            "grin",
            "grinning",
            "subdivision",
            "flag",
            ":D",
            "grinning face"
        ]
    },
    {
        "category": "Smileys & Emotion",
        "section": "face-smiling",
        "n": "2",
        "code": "U+1F603",
        "text": "\ud83d\ude03",
        "recentlyAdded": false,
        "name": "grinning face with big eyes",
        "vendors": {
            "Appl": true,
            "Goog": true,
            "FB": true,
            "Wind": true,
            "Twtr": true,
            "Joy": true,
            "Sams": true,
            "GMail": true,
            "SB": true,
            "DCM": true,
            "KDDI": true,
            "Tlgr": true
        },
        "tags": [
            "face",
            "grinning face with big eyes",
            "mouth",
            "open",
            "smile"
        ],
        "keywords": [
            "face",
            "grinning",
            "big",
            "eyes",
            "mouth",
            "open",
            "smile",
            "subdivision",
            "flag",
            "grin",
            "eye",
            ":D",
            ":)",
            "grinning face with big eyes"
        ]
    },
    {
        "category": "Smileys & Emotion",
        "section": "face-smiling",
        "n": "3",
        "code": "U+1F604",
        "text": "\ud83d\ude04",
        "recentlyAdded": false,
        "name": "grinning face with smiling eyes",
        "vendors": {
            "Appl": true,
            "Goog": true,
            "FB": true,
            "Wind": true,
            "Twtr": true,
            "Joy": true,
            "Sams": true,
            "GMail": true,
            "SB": true,
            "DCM": false,
            "KDDI": false,
            "Tlgr": true
        },
        "tags": [
            "eye",
            "face",
            "grinning face with smiling eyes",
            "mouth",
            "open",
            "smile"
        ],
        "keywords": [
            "eye",
            "face",
            "grinning",
            "smiling",
            "eyes",
            "mouth",
            "open",
            "smile",
            "subdivision",
            "flag",
            "grin",
            "joy",
            "funny",
            "haha",
            "laugh",
            ":D",
            ":)",
            "grinning face with smiling eyes"
        ]
    }
]

Tags: jsonfalsetruedatawithopenfacebig
1条回答
网友
1楼 · 发布于 2024-04-29 19:36:06

使用

with open("emoji-by-category2.json", 'w', encoding='utf-8', errors='ignore') as json_file:
    json.dump(data, json_file, indent=4, ensure_ascii=False)

阅读JSON编码器和解码器的文档

Basic Usage:

json.dump(obj, fp, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw)

… If ensure_ascii is true (the default), the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii is false, these characters will be output as-is. …

相关问题 更多 >