在有效的json中，如何解析.txt并在CSV中存储tweets？

fin = open("sim.txt") fout = open("output.txt", "w+") delete_list = ['ObjectId(', 'NumberLong(','ISODate(', ')'] for line in fin: for word in delete_list: line = line.replace(word, "") fout.write(line) fin.close() fout.close()

{ "_id": "582f4fbd44b65941a0a81213", "contributors": null, "truncated": false, "text": "Tonight at 10 PM ET, 7 PM PT, on @FoxNews, a one hour special on me and my life by @HarveyLevinTMZ. Enjoy!", "is_quote_status": false, "in_reply_to_status_id": null, "id": "799660246788612100", "favorite_count": 15765, "source": "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>", "retweeted": false, "coordinates": null, "entities": { "symbols": [], "user_mentions": [{ "id": 1367531, "indices": [33, 41], "id_str": "1367531", "screen_name": "FoxNews", "name": "Fox News" }, { "id": 36098990, "indices": [83, 98], "id_str": "36098990", "screen_name": "HarveyLevinTMZ", "name": "Harvey Levin" }], "hashtags": [], "urls": [] }, "in_reply_to_screen_name": null, "in_reply_to_user_id": null, "retweet_count": 5251, "id_str": "799660246788612100", "favorited": false, "user": { "id": 25073877, "id_str": "25073877" }, "geo": null, "in_reply_to_user_id_str": null, "lang": "en", "created_at": "Fri Nov 18 17:07:14 +0000 2016", "in_reply_to_status_id_str": null, "place": null, "created_at_date": "2016-11-18T17:07:14Z" }

2条回答

网友

1楼 · 编辑于 2024-04-20 05:22:07

这个过程可以通过使用Pandas来简化。在

{{json}如果文件的扩展名不是^有效的，请考虑json}文件的扩展名是否有效。在

import pandas as pd

df = pd.read_json("path/to/input.txt")
df[["text", "created_at_date"]].to_csv("output.csv", index=False)

网友

2楼 · 编辑于 2024-04-20 05:22:07

请注意json路径，并且您的文本文件中必须有一个有效的json。在

/path/to/json/文件.json

[{
        "_id": "dummyid1",
        "contributors": null,
        "truncated": false,
        "text": "Dummy tweet 1",
        "is_quote_status": false,
        "in_reply_to_status_id": null,
        "id": "799660246788612100",
        "favorite_count": 15765,
        "source": "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>",
        "retweeted": false,
        "coordinates": null,
        "entities": {
            "symbols": [],
            "user_mentions": [{
                "id": 1367531,
                "indices": [33, 41],
                "id_str": "1367531",
                "screen_name": "FoxNews",
                "name": "Fox News"
            }, {
                "id": 36098990,
                "indices": [83, 98],
                "id_str": "36098990",
                "screen_name": "HarveyLevinTMZ",
                "name": "Harvey Levin"
            }],
            "hashtags": [],
            "urls": []
        },
        "in_reply_to_screen_name": null,
        "in_reply_to_user_id": null,
        "retweet_count": 5251,
        "id_str": "799660246788612100",
        "favorited": false,
        "user": {
            "id": 25073877,
            "id_str": "25073877"
        },
        "geo": null,
        "in_reply_to_user_id_str": null,
        "lang": "en",
        "created_at": "Fri Nov 18 17:07:14 +0000 2016",
        "in_reply_to_status_id_str": null,
        "place": null,
        "created_at_date": "2016-11-18T17:07:14Z"
    },
    {
        "_id": "dummyid2",
        "contributors": null,
        "truncated": false,
        "text": "Dummy tweet 2",
        "is_quote_status": false,
        "in_reply_to_status_id": null,
        "id": "799660246788612100",
        "favorite_count": 15765,
        "source": "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>",
        "retweeted": false,
        "coordinates": null,
        "entities": {
            "symbols": [],
            "user_mentions": [{
                "id": 1367531,
                "indices": [33, 41],
                "id_str": "1367531",
                "screen_name": "FoxNews",
                "name": "Fox News"
            }, {
                "id": 36098990,
                "indices": [83, 98],
                "id_str": "36098990",
                "screen_name": "HarveyLevinTMZ",
                "name": "Harvey Levin"
            }],
            "hashtags": [],
            "urls": []
        },
        "in_reply_to_screen_name": null,
        "in_reply_to_user_id": null,
        "retweet_count": 5251,
        "id_str": "799660246788612100",
        "favorited": false,
        "user": {
            "id": 25073877,
            "id_str": "25073877"
        },
        "geo": null,
        "in_reply_to_user_id_str": null,
        "lang": "en",
        "created_at": "Fri Nov 18 17:07:14 +0000 2016",
        "in_reply_to_status_id_str": null,
        "place": null,
        "created_at_date": "2016-11-18T17:07:14Z"
    }
]

脚本.py

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章