使用python ijson读取包含多个json对象的大型json文件问题的回答

使用python ijson读取包含多个json对象的大型json文件

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

不幸的是，<a href="https://github.com/isagalaev/ijson" rel="nofollow noreferrer">ijson</a>库（截至2018年3月的v2.3）不处理解析多个JSON对象。它只能处理一个整体对象，如果您试图解析第二个对象，将得到一个错误：<code>"ijson.common.JSONError: Additional data"</code>。请参阅此处的错误报告： <ul> <li><a href="https://github.com/isagalaev/ijson/issues/40" rel="nofollow noreferrer">https://github.com/isagalaev/ijson/issues/40</a></li> <li><a href="https://github.com/isagalaev/ijson/issues/42" rel="nofollow noreferrer">https://github.com/isagalaev/ijson/issues/42</a></li> <li><a href="https://github.com/isagalaev/ijson/issues/67" rel="nofollow noreferrer">https://github.com/isagalaev/ijson/issues/67</a></li> <li><a href="https://stackoverflow.com/questions/34217042/python-how-do-i-parse-a-stream-of-json-arrays-with-ijson-library">python: how do I parse a stream of json arrays with ijson library</a></li> </ul> 这是一个很大的限制。但是，只要在每个JSON对象后面都有换行符（新行字符），就可以独立地逐行分析每个，如下所示： <pre><code>import io import ijson with open(filename, encoding="UTF-8") as json_file: cursor = 0 for line_number, line in enumerate(json_file): print ("Processing line", line_number + 1,"at cursor index:", cursor) line_as_file = io.StringIO(line) # Use a new parser for each line json_parser = ijson.parse(line_as_file) for prefix, type, value in json_parser: print ("prefix=",prefix, "type=",type, "value=",value) cursor += len(line) </code></pre> 您仍然在对文件进行流式处理，并且没有将其完全加载到内存中，因此它可以处理大型JSON文件。它还使用来自：<a href="https://stackoverflow.com/questions/620367/how-to-jump-to-a-particular-line-in-a-huge-text-file">How to jump to a particular line in a huge text file?</a>的行流技术，并使用来自：<a href="https://stackoverflow.com/questions/522563/accessing-the-index-in-python-for-loops">Accessing the index in 'for' loops?</a>的<code>enumerate()</code>

使用python ijson读取包含多个json对象的大型json文件

1 个回答

相关Python问题