JSON文件不会完全加载到python fi中

2024-04-26 05:21:06 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在用tensorflow编写一个python聊天机器人,它利用了过去几年在这里发现的所有Reddit评论的转储https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_reddit_comment/?st=j9udbxta&sh=69e4fee7。我通过洪流下载了评论,一切似乎都很顺利。但是,当我将JSON文件读入python程序时,整个文件似乎没有加载。2015年每个月的数据约为15000KB,但JSON只会加载前2600行,而真正的文件有成百上千行。当我查看从JSON文件加载的最后一行时,它似乎被截短了,因为这样的原因,它位于句子的中间。你知道吗

    {"subreddit":"sydney","author_flair_text":null,"id":"cqugtij","gilded":0,"removal_reason":null,"downs":0,"archived":false,"created_utc":"1430439358","link_id":"t3_34e5fd","ups":6,"subreddit_id":"t5_2qkob","name":"t1_cqugtij","score_hidden":false,"author_flair_css_class":null,"parent_id":"t1_cqttsc3","controversiality":0,"score":6,"author":"SilverMeteor9798","body":"As state transport minister almost every press release from Gladys had something in there about how the liberals were \"getting on with the job\" and blaming Labor for something. It wasn't necessarily false, it just got tiresome after a while particular

这是我用来读取JSON文件的代码

    timeframe = '2015-05'
    with open("Data/reddit_data/{}/RC_{}".format(timeframe.split('-')[0], timeframe), buffering=1000) as f:
        for row in f:
            row = json.loads(row)

其中timeframe是与2015年5月Reddit评论相关的特定JSON文件。当我运行这个代码时,我得到这个错误

    json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 368 (char 367)

这对我来说很有意义,因为加载的JSON文件的最后一行被缩短了,但是如何让python读取整个JSON文件呢?我在YouTube(https://www.youtube.com/watch?v=dvOnYLDg8_Y)上关注sentdex的chatbot教程,甚至当我运行他的代码时,我也会得到同样的错误。如何加载整个JSON文件,以便读取数十万条注释?我尝试更改缓冲区,并尝试重新下载注释。你知道吗


Tags: 文件代码httpscomidjsonfalsewww

热门问题