使用Python访问JSON文件,得到“内存错误”

2024-06-16 12:20:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用的是JSON数据集(reddit数据),数据大小为5GB。我的JSON数据块如下所示。在

{"subreddit":"languagelearning","parent_id":"t1_cn9nn8v","retrieved_on":1425123427,"ups":1,"author_flair_css_class":"","gilded":0,"author_flair_text":"Lojban (N)","controversiality":0,"subreddit_id":"t5_2rjsc","edited":false,"score_hidden":false,"link_id":"t3_2qulql","name":"t1_cnau2yv","created_utc":"1420074627","downs":0,"body":"I played around with the Japanese Duolingo for awhile and basically if you're not near Fluency you won't learn much of anything.\n\nAs was said below, the only one that really exists is Chineseskill.","id":"cnau2yv","distinguished":null,"archived":false,"author":"Pennwisedom","score":1}

我使用python列出这些数据中的每个“subreddit”。但我记错了。 下面是我的python代码和错误。在

import json
data=json.loads(open('/media/RC_2015-01').read())
for item in data:
   name = item.get("subreddit")
   print name

Traceback (most recent call last): File "name_python.py", line 4, in data=json.loads(open('/media/RC_2015-01').read()) MemoryError

我们知道的是,我正在尝试加载非常大的数据,这就是为什么我得到内存错误。有人能建议其他的解决办法吗。在


Tags: the数据nameidjsonfalsefordata
1条回答
网友
1楼 · 发布于 2024-06-16 12:20:01

您需要使用像ijson这样的迭代解析器一次解析每个记录,而不是将整个文件加载到内存中。在

关于您的错误消息,请确保您的数据是有效的JSON,并且记录周围有方括号。此结构将正确解析

[
 {...},
 {...}
]

而以下结构将引发“附加数据”异常

^{pr2}$

相关问题 更多 >