在Python中将CSV转换为Json
我有一个Excel表格(保存为CSV文件),里面有四列。第一列和第三列是一些单词,第二列和第四列是这些单词的出现频率。大概是这样的:
word1, freq1, word2, freq2
word3, freq3, word4, freq4
……以此类推
我有一段代码可以把这个CSV文件转换成JSON文件。
import csv
import json
csvfile = open('sample.csv', 'r')
jsonfile = open('sample.json', 'w')
fieldnames = ("feature","r", "feature","r")
reader = csv.DictReader(csvfile, fieldnames)
out = json.dumps( [ row for row in reader ] )
jsonfile.write(out)
这非常简单。不过,这段代码生成的JSON文件看起来是这样的:
[{"r" : freq2 "feature" : "word2"} {"r" : freq1 "feature" : "word1"}{"r" : freq4 "feature" : "word4"}{"r" : freq3 "feature" : "word3"}]
我想找个办法,让生成的JSON文件看起来像这样:
[{"word1" : freq1}{"word2" :freq2}{"word3" :freq3}{"word4" :freq4}]
换句话说,我想把表格的第一列作为第二列的键,把第三列作为第四列的键。
相关问题:
5 个回答
0
这是一个不需要导入库的在Python中将CSV转换为JSON的解决方案。
我相信json和csv库都很好用,但我最后还是没有用它们。所以也许这对其他人会有帮助。
简单来说:这个方法从CSV中提取数据,并生成JSON字符串。
虽然有点笨拙,但确实能用。
#set up paths and vars
csvfile = open('input.csv','r')
jsonfile = open('output.json', 'w')
arr=[]
headers = []
# Read in the headers/first row
for header in csvfile.readline().split(','):
headers.append(header)
# Extract the information into the "xx" : "yy" format.
for line in csvfile.readlines():
lineStr = ''
for i,item in enumerate(line.split(',')):
if i < 28: #I skip the last two columns for my application
lineStr+='"'+headers[i] +'" : "' + item + '",\n'
arr.append(lineStr)
csvfile.close()
#convert the array into a JSON string:
jsn = '{\n "entries":['
jsnEnd = ']\n}'
for i in range(len(arr)-1):
if i == len(arr)-2:
jsn+="{"+str(arr[i])[:-2]+"}\n" #Get rid of the last comma if last entry
else:
jsn+="{"+str(arr[i])[:-2]+"},\n" #Get rid of the last comma
jsn+=jsnEnd
#write to file
jsonfile.write(jsn)
jsonfile.close()
print "Done."
我添加这个主要是为了给其他可能需要这种脚本的人做个参考。
0
试试这个链接:https://github.com/samarjeet27/CSV-Mapper/
import csvmapper
# create map file
mapper = csvmapper.DictMapper([
[
{'name':'word1' },
{'name':'word2'},
{'name':'word3'},
{'name':'word4'},
]
])
# parser instance
parser = csvmapper.CSVParser('sample.csv', mapper)
converter = csvmapper.JSONConverter(parser)
# conver to json
print converter.doConvert(False)
0
假设你有这样的数据:
feature, r, feature, r
word1, freq1, word2, freq2
word3, freq3, word4, freq4
如果我可以使用我自己的库,这里有一个图示的解决方案:
>>> import pyexcel
>>> r=pyexcel.SeriesReader("sample.csv")
>>> r[0]
['word1', ' freq1', ' word2', ' freq2']
>>> r[1]
['word3', ' freq3', ' word4', ' freq4']
>>> r.series()
['feature', ' r', ' feature', ' r']
>>> r.column_at(0)
['word1', 'word3']
>>> r.column_at(1)
[' freq1', ' freq3']
>>> r.column_at(2)
[' word2', ' word4']
>>> r.column_at(3)
[' freq2', ' freq4']
>>> a=zip(r.column_at(0),r.column_at(1))
>>> b=zip(r.column_at(2),r.column_at(3))
>>> a+b
[('word1', ' freq1'), ('word3', ' freq3'), (' word2', ' freq2'), (' word4', ' freq4')]
>>> j=open('sample.json', 'w')
>>> import json
>>> j.write(json.dumps(a+b))
>>> j.close()
>>> exit()
这是结果:
[["word1", " freq1"], ["word3", " freq3"], [" word2", " freq2"], [" word4", " freq4"]]
你可以看到,引用中还有空格。所以你可以使用一个叫做SheetFormatter的工具:
>>> import pyexcel
>>> r=pyexcel.SeriesReader("sample.csv")
>>> def clean(value, type):
... return value.strip()
...
>>> r.add_formatter(pyexcel.formatters.SheetFormatter(str, clean))
>>> r.column_at(0)
['word1', 'word3']
>>> r.column_at(1)
['freq1', 'freq3']
>>> r.column_at(2)
['word2', 'word4']
>>> r.column_at(3)
['freq2', 'freq4']
更多的文档可以在 pyhosted 上找到。
0
稍微多说一点,你能试试这个吗?
import csv, json
def dump_to_json():
csv_result = []
with open('sample.csv', 'rb') as csvfile:
for row in csv.DictReader(csvfile, delimiter=',', quotechar='"'):
csv_result.append({'word1': row['word1'], 'freq1': row['freq1'],
'word2': row['word2'], 'freq2': row['freq2']})
json_feed = [{c['word1']: c['freq1'], c['word2']: c['freq2']} for c in csv_result]
with open('sample.json', 'w') as outfile:
json.dump(json_feed, outfile)
dump_to_json()
0
很遗憾,Python 的 DictReader 不太适合你的需求,不过用一点 zip 的小技巧就能解决问题。
import csv, json
csvfile = open('sample.csv', 'r')
jsonfile = open('sample.json', 'w')
fieldnames = ("word1","freq1", "word2","freq2")
reader = csv.reader(csvfile, fieldnames)
out = json.dumps( [ dict(zip(row[::2], row[1::2])) for row in reader ] )
jsonfile.write(out)
dict(zip(row[::2], row[1::2])) 这段代码会创建一个字典,它把奇数列的值和对应的偶数列的值关联起来。