我使用的是JSON数据集(reddit数据),数据大小为5GB。我的JSON数据块如下所示。在
{"subreddit":"languagelearning","parent_id":"t1_cn9nn8v","retrieved_on":1425123427,"ups":1,"author_flair_css_class":"","gilded":0,"author_flair_text":"Lojban (N)","controversiality":0,"subreddit_id":"t5_2rjsc","edited":false,"score_hidden":false,"link_id":"t3_2qulql","name":"t1_cnau2yv","created_utc":"1420074627","downs":0,"body":"I played around with the Japanese Duolingo for awhile and basically if you're not near Fluency you won't learn much of anything.\n\nAs was said below, the only one that really exists is Chineseskill.","id":"cnau2yv","distinguished":null,"archived":false,"author":"Pennwisedom","score":1}
我使用python列出这些数据中的每个“subreddit”。但我记错了。 下面是我的python代码和错误。在
import json
data=json.loads(open('/media/RC_2015-01').read())
for item in data:
name = item.get("subreddit")
print name
Traceback (most recent call last): File "name_python.py", line 4, in data=json.loads(open('/media/RC_2015-01').read()) MemoryError
我们知道的是,我正在尝试加载非常大的数据,这就是为什么我得到内存错误。有人能建议其他的解决办法吗。在
您需要使用像ijson这样的迭代解析器一次解析每个记录,而不是将整个文件加载到内存中。在
关于您的错误消息,请确保您的数据是有效的JSON,并且记录周围有方括号。此结构将正确解析
而以下结构将引发“附加数据”异常
^{pr2}$相关问题 更多 >
编程相关推荐