Unicode引号在Python中自动评估

1 投票
4 回答
2957 浏览
提问于 2025-04-17 23:54

我现在正在处理一些包含Unicode编码引号的json字符串,格式如下:

'{"test":"\u0022"}'

当把它当作字符串来处理时,结果是这样的:

'{"test":"""}'

这导致在加载时出现一个ValueError错误:

>>> json.loads('{"test":"\u0022"}')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.3/json/__init__.py", line 319, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.3/json/decoder.py", line 352, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.3/json/decoder.py", line 368, in raw_decode
    obj, end = self.scan_once(s, idx)
ValueError: Expecting ',' delimiter: line 1 column 11 (char 10)
>>> 

我可以通过在输入被utf-8编码解释之前,把它当作字节字符串来处理,并进行查找替换来解决这个问题;不过,这对我实际处理的输入来说是不可能的,因为这些输入是通过库查询一个API返回的,而这个API返回的是utf-8编码的字符串。

有没有办法让Python不自动编码这些Unicode字符呢?

4 个回答

0

应该得到一个包含json字符串的bytes对象。你需要对它进行解码,才能用json.loads来处理。用Python3的话,这样做没问题。

>>> url = "http://api.tumblr.com/v2/blog/distant-traveller.tumblr.com/posts?api_key=IkJtqSbg6Nd3OBnUdaGl9YWE3ocupygJcnPebHRou8eFbd4RUv&id=79086448801"
>>> import json, urllib.request
>>> jdata = urllib.request.urlopen(url).read()
>>> json.loads(jdata.decode())
{'meta': {'msg': 'OK', 'status': 200}, 'response': {'total_posts': 1, 'blog': {'is_nsfw': False, 'ask': True, 'ask_page_title': 'Ask me anything', 'posts': 5152, 'url': 'http://distant-traveller.tumblr.com/', 'name': 'distant-traveller', 'likes': 44022, 'description': '"The surface of the Earth is the shore of the cosmic ocean... Recently, we\'ve managed to wade a little way out, and the water seems inviting." - Carl Sagan', 'share_likes': True, 'updated': 1395784772, 'title': 'Voyage into Space', 'ask_anon': True}, 'posts': [{'source_url': 'http://wonderous-world.com/post/77780009786/starry-sky-and-jupiter-by-timo-braun', 'image_permalink': 'http://distant-traveller.tumblr.com/image/79086448801', 'link_url': 'http://wonderous-world.tumblr.com', 'source_title': 'wonderous-world', 'photos': [{'caption': '', 'alt_sizes': [{'height': 750, 'width': 500, 'url': 'http://31.media.tumblr.com/159388efacbfa78e281fd5aa0476864f/tumblr_n1jddlcPjW1r787hmo1_500.jpg'}, {'height': 600, 'width': 400, 'url': 'http://25.media.tumblr.com/159388efacbfa78e281fd5aa0476864f/tumblr_n1jddlcPjW1r787hmo1_400.jpg'}, {'height': 375, 'width': 250, 'url': 'http://31.media.tumblr.com/159388efacbfa78e281fd5aa0476864f/tumblr_n1jddlcPjW1r787hmo1_250.jpg'}, {'height': 150, 'width': 100, 'url': 'http://25.media.tumblr.com/159388efacbfa78e281fd5aa0476864f/tumblr_n1jddlcPjW1r787hmo1_100.jpg'}, {'height': 75, 'width': 75, 'url': 'http://24.media.tumblr.com/159388efacbfa78e281fd5aa0476864f/tumblr_n1jddlcPjW1r787hmo1_75sq.jpg'}], 'original_size': {'height': 750, 'width': 500, 'url': 'http://31.media.tumblr.com/159388efacbfa78e281fd5aa0476864f/tumblr_n1jddlcPjW1r787hmo1_500.jpg'}}], 'id': 79086448801, 'state': 'published', 'tags': [], 'date': '2014-03-09 20:01:37 GMT', 'timestamp': 1394395297, 'note_count': 7503, 'reblog_key': 'IFKcbmbd', 'short_url': 'http://tmblr.co/ZbkMUw19fwt2X', 'blog_name': 'distant-traveller', 'post_url': 'http://distant-traveller.tumblr.com/post/79086448801/wonderous-world-starry-sky-and-jupiter-by-timo', 'slug': 'wonderous-world-starry-sky-and-jupiter-by-timo', 'type': 'photo', 'caption': '<p><a class="tumblr_blog" href="http://wonderous-world.com/post/77780009786/starry-sky-and-jupiter-by-timo-braun">wonderous-world</a>:</p>\n<blockquote>\n<p><a href="http://www.flickr.com/photos/timobraunphotos/12695374254/">Starry Sky and Jupiter</a> by\xa0<a class="owner-name truncate" href="http://www.flickr.com/photos/timobraunphotos/" title="Go to Timo Braun\'s photostream" data-track="attributionNameClick">Timo Braun</a></p>\n</blockquote>', 'format': 'html', 'highlighted': []}]}}

美化后的版本:

>>> import pprint
>>> pprint.pprint(json.loads(jdata.decode()))
{'meta': {'msg': 'OK', 'status': 200},
 'response': {'blog': {'ask': True,
                       'ask_anon': True,
                       'ask_page_title': 'Ask me anything',
                       'description': '"The surface of the Earth is the shore of the cosmic ocean... Recently, we\'ve managed to wade a little way out, and the water seems inviting." - Carl Sagan',
                       'is_nsfw': False,
                       'likes': 44022,
                       'name': 'distant-traveller',
                       'posts': 5152,
                       'share_likes': True,
                       'title': 'Voyage into Space',
                       'updated': 1395784772,
                       'url': 'http://distant-traveller.tumblr.com/'},
              'posts': [{'blog_name': 'distant-traveller',
                         'caption': '<p><a class="tumblr_blog" href="http://wonderous-world.com/post/77780009786/starry-sky-and-jupiter-by-timo-braun">wonderous-world</a>:</p>\n<blockquote>\n<p><a href="http://www.flickr.com/photos/timobraunphotos/12695374254/">Starry Sky and Jupiter</a> by\xa0<a class="owner-name truncate" href="http://www.flickr.com/photos/timobraunphotos/" title="Go to Timo Braun\'s photostream" data-track="attributionNameClick">Timo Braun</a></p>\n</blockquote>',
                         'date': '2014-03-09 20:01:37 GMT',
                         'format': 'html',
                         'highlighted': [],
                         'id': 79086448801,
                         'image_permalink': 'http://distant-traveller.tumblr.com/image/79086448801',
                         'link_url': 'http://wonderous-world.tumblr.com',
                         'note_count': 7503,
                         'photos': [{'alt_sizes': [{'height': 750,
                                                    'url': 'http://31.media.tumblr.com/159388efacbfa78e281fd5aa0476864f/tumblr_n1jddlcPjW1r787hmo1_500.jpg',
                                                    'width': 500},
                                                   {'height': 600,
                                                    'url': 'http://25.media.tumblr.com/159388efacbfa78e281fd5aa0476864f/tumblr_n1jddlcPjW1r787hmo1_400.jpg',
                                                    'width': 400},
                                                   {'height': 375,
                                                    'url': 'http://31.media.tumblr.com/159388efacbfa78e281fd5aa0476864f/tumblr_n1jddlcPjW1r787hmo1_250.jpg',
                                                    'width': 250},
                                                   {'height': 150,
                                                    'url': 'http://25.media.tumblr.com/159388efacbfa78e281fd5aa0476864f/tumblr_n1jddlcPjW1r787hmo1_100.jpg',
                                                    'width': 100},
                                                   {'height': 75,
                                                    'url': 'http://24.media.tumblr.com/159388efacbfa78e281fd5aa0476864f/tumblr_n1jddlcPjW1r787hmo1_75sq.jpg',
                                                    'width': 75}],
                                     'caption': '',
                                     'original_size': {'height': 750,
                                                       'url': 'http://31.media.tumblr.com/159388efacbfa78e281fd5aa0476864f/tumblr_n1jddlcPjW1r787hmo1_500.jpg',
                                                       'width': 500}}],
                         'post_url': 'http://distant-traveller.tumblr.com/post/79086448801/wonderous-world-starry-sky-and-jupiter-by-timo',
                         'reblog_key': 'IFKcbmbd',
                         'short_url': 'http://tmblr.co/ZbkMUw19fwt2X',
                         'slug': 'wonderous-world-starry-sky-and-jupiter-by-timo',
                         'source_title': 'wonderous-world',
                         'source_url': 'http://wonderous-world.com/post/77780009786/starry-sky-and-jupiter-by-timo-braun',
                         'state': 'published',
                         'tags': [],
                         'timestamp': 1394395297,
                         'type': 'photo'}],
              'total_posts': 1}}
2

你的问题似乎是你在把一个字符串复制粘贴到Python里时,没有处理好特殊字符。其实是Python,而不是json模块,把\u0022变成了引号。而这种解析只会在字符串的字面量上运行,或者是在eval函数中传入的内容。如果你以正确的方式获取数据,就不会有这个问题:

>>> import requests
>>> resp = requests.get("http://api.tumblr.com/v2/blog/distant-traveller.tumblr.com/posts?api_key=IkJtqSbg6Nd3OBnUdaGl9YWE3ocupygJcnPebHRou8eFbd4RUv&id=79086448801")
>>> json.loads(resp.text)
# Gives data, not an error

如果你确实想把它粘贴到你的源文件里,可以使用原始字符串,这样就可以禁用Python对那个字面量的\u...解析,这样你在字符串中就会得到那些原始字符,而不是被解码后的单个字符:

>>> json.loads(r'{"test":"\u0022"}')
{'test': '"'}
2

问题是,你在例子中使用的是字节字符串。你可以选择请求unicode格式,或者像这个例子那样对它们进行解码:

txt = b'{"test":"\u0022"}'
json.loads(txt.decode())
Out[10]: {'test': '"'}

如果你能看到unicode字面量应该是什么样子的,可能会更清楚:

txt.decode()
Out[12]: '{"test":"\\u0022"}'
3

如果你从API查询中获取字符串,它们已经被正确处理过了。比如,当你在源文件中写

'{"test":"\u0022"}'

时,Python会把\u0022理解为在字符串中应该包含一个字面上的"。从正确编写的API代码中获取的字符串,会包含一个字面上的反斜杠u和一些数字。它的效果就相当于在源文件中写的内容:

'{"test":"\\u0022"}'

如果你的代码在处理API查询返回的实际数据时出错,可能是API本身有问题(这种情况可能会发生,但不太常见),或者你在处理数据时做错了什么,可能是对转义字符进行了重复解析。

撰写回答