我正在处理一个非嵌套的json文件,数据来自reddit。我正在尝试使用python将其转换为csv文件。每一行都没有相同的字段,因此不断得到错误:
JSONDecodeError: Extra data: line 2 column 1
代码如下:
^{pr2}$以下是数据中的几行:
{"author":"i_had_an_apostrophe","body":"\"It's not your fault.\"","author_flair_css_class":null,"link_id":"t3_5c0rn0","subreddit":"AskReddit","created_utc":1478736000,"subreddit_id":"t5_2qh1i","parent_id":"t1_d9t3q4d","author_flair_text":null,"id":"d9tlp0j"}
{"id":"d9tlp0k","author_flair_text":null,"parent_id":"t1_d9tame6","link_id":"t3_5c1efx","subreddit":"technology","created_utc":1478736000,"subreddit_id":"t5_2qh16","author":"willliam971","body":"9/11 inside job??","author_flair_css_class":null}
{"created_utc":1478736000,"subreddit_id":"t5_2qur2","link_id":"t3_5c44bz","subreddit":"excel","author":"excelevator","author_flair_css_class":"points","body":"Have you tried stepping through the code to analyse the values at each step?\n\n","author_flair_text":"442","id":"d9tlp0l","parent_id":"t3_5c44bz"}
{"created_utc":1478736000,"subreddit_id":"t5_2tycb","link_id":"t3_5c384j","subreddit":"OldSchoolCool","author":"10minutes_late","author_flair_css_class":null,"body":"**Thanks Hillary**","author_flair_text":null,"id":"d9tlp0m","parent_id":"t3_5c384j"}
我正在考虑获取csv文件(作为头)中可用的所有字段,如果特定字段的数据不可用,只需用NA填充它。在
你的问题是缺少关于你要完成什么的信息,所以我在猜测。请注意,csv文件不使用“nulls”来表示缺少的字段,它们只是有分隔符,中间没有任何内容,比如
1,2,,4,5
,它没有第三个字段值。在另外,如何打开csv文件取决于您使用的是python2还是python3。下面的代码适用于Python3。在
您可以编写一个小函数来构建行,只在数据可用的地方提取数据,如果不可用则不插入任何数据。你所说的头,我叫模式。获取所有字段,删除重复项并进行排序,然后根据完整的字段集构建记录,并将这些记录插入到csv中。在
我建议您使用
csv.DictWriter
类。这个类需要一个文件和一个字段名列表(我从您的数据示例中了解到)。在相关问题 更多 >
编程相关推荐