解码json时出错

import json import urllib import re import binascii def asciirepl(match): s = match.group() return binascii.unhexlify(s[2:]) query = 'google' p = urllib.urlopen('http://www.google.com/dictionary/json?callback=a&q='+query+'&sl=en&tl=en&restrict=pr,de&client=te') page = p.read()[2:-10] #As its returned as a function call #To replace hex characters with ascii characters p = re.compile(r'\\x(\w{2})') ascii_string = p.sub(asciirepl, page) #Now decoding cleaned json response data = json.loads(ascii_string)

shadyabhi@archlinux /tmp $ python2 define.py Traceback (most recent call last): File "define.py", line 19, in <module> data = json.loads(ascii_string) File "/usr/lib/python2.7/json/__init__.py", line 326, in loads return _default_decoder.decode(s) File "/usr/lib/python2.7/json/decoder.py", line 366, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode obj, end = self.scan_once(s, idx) ValueError: Expecting , delimiter: line 1 column 403 (char 403)

2条回答

网友

1楼 · 编辑于 2024-06-06 06:37:48

字符403是“text”中的第一个嵌入引号-这是无效的json：

{
   "type":"url",
   "text":"<a href="http://www.people-communicating.com/jargon-words.html">http://www.people-communicating.com/jargon-words.html</a>",
   "language":"en"
}

这是服务器返回的内容-注意，没有嵌入引号：

{
    "type":"url",
    "text":"\\x3ca href\\x3d\\x22http://www.people-communicating.com/jargon-words.html\\x22\\x3ehttp://www.people-communicating.com/jargon-words.html\\x3c/a\\x3e",
    "language":"en"
}

最好的方法是首先对json进行解码，然后根据需要对每个字符串进行解十六进制。

EDIT：如果这真的是无效的JSON，正如Karl Knechtel在评论中所说，Google应该被告知他们的API是不正确的。如果Python的实现是对有效JSON的攻击，那么应该告诉他们修复它。不管你做什么工作，如果这个问题得到解决，应该很容易删除。

网友

2楼 · 编辑于 2024-06-06 06:37:48

对\x转义进行解码可能会产生“标记”，这些标记需要重新转义，因为它们出现在JSON数据中编码的“字符串”中。

def asciirepl(match):
  chr = binascii.unhexlify(match.group()[2:])
  return '\\' + chr if chr in ('\\"') else chr

这仍然不能处理控制字符；因此您可能希望将\x转义转换为\u转义，这在JSON标准中描述并由json模块解析。这样做的好处是更简单：）

def asciirepl(match):
  return '\\u00' + match.group()[2:]

相关问题更多 >

编程相关推荐

热门问题

热门文章