将单列CSV文件合并为14列的CSV文件
我现在有14个CSV文件,每个文件里有一天的数据(14个是因为我需要追溯到两周前)
我想做的是把这14个CSV文件里的数据合并成一个CSV文件
比如,如果每个CSV文件里包含这些内容:
1
2
3
4
那么我希望最终得到的CSV文件是这样的:
1,1,1,1,1,1,1,1,1,1,1,1,1,1,
2,2,2,2,2,2,2,2,2,2,2,2,2,2,
3,3,3,3,3,3,3,3,3,3,3,3,3,3,
4,4,4,4,4,4,4,4,4,4,4,4,4,4,
(实际上这些CSV文件都有288行数据)
我现在用的是从其他问题中找到的一些代码,这段代码在处理2或3个CSV文件时运行得很好,但当我添加更多文件时,它只处理了前3个,后面的就不管了,而且代码现在看起来非常乱。
抱歉代码块有点长,但这就是我目前的进展。
def csvappend():
with open('C:\dev\OTQtxt\\result1.csv', 'rb') as csv1:
with open('C:\dev\OTQtxt\\result2.csv', 'rb') as csv2:
with open('C:\dev\OTQtxt\\result3.csv', 'rb') as csv3:
with open('C:\dev\OTQtxt\\result4.csv', 'rb') as csv4:
with open('C:\dev\OTQtxt\\result5.csv', 'rb') as csv5:
with open('C:\dev\OTQtxt\\result6.csv', 'rb') as csv6:
with open('C:\dev\OTQtxt\\result7.csv', 'rb') as csv7:
with open('C:\dev\OTQtxt\\result8.csv', 'rb') as csv8:
with open('C:\dev\OTQtxt\\result9.csv', 'rb') as csv9:
with open('C:\dev\OTQtxt\\result10.csv', 'rb') as csv10:
with open('C:\dev\OTQtxt\\result11.csv', 'rb') as csv11:
with open('C:\dev\OTQtxt\\result12.csv', 'rb') as csv12:
with open('C:\dev\OTQtxt\\result13.csv', 'rb') as csv13:
with open('C:\dev\OTQtxt\\result14.csv', 'rb') as csv14:
reader1 = csv.reader(csv1, delimiter=',')
reader2 = csv.reader(csv2, delimiter=',')
reader3 = csv.reader(csv3, delimiter=',')
reader4 = csv.reader(csv4, delimiter=',')
reader5 = csv.reader(csv5, delimiter=',')
reader6 = csv.reader(csv6, delimiter=',')
reader7 = csv.reader(csv7, delimiter=',')
reader8 = csv.reader(csv8, delimiter=',')
reader9 = csv.reader(csv9, delimiter=',')
reader10 = csv.reader(csv10, delimiter=',')
reader11 = csv.reader(csv11, delimiter=',')
reader12 = csv.reader(csv12, delimiter=',')
reader13 = csv.reader(csv13, delimiter=',')
reader14 = csv.reader(csv14, delimiter=',')
all = []
for row1, row2, row3, row4, row5, row6, row7, row8, row9, \
row10, row11, row12, row13, row14 in zip(reader1, \
reader2, reader3,\
reader4, reader5, \
reader7, reader8,\
reader9, reader10, \
reader11, reader12,\
reader13,reader14):
row14.append(row1[0])
row14.append(row2[0])
row14.append(row3[0])
row14.append(row4[0])
row14.append(row5[0])
row14.append(row6[0])
row14.append(row7[0])
row14.append(row8[0])
row14.append(row9[0])
row14.append(row10[0])
row14.append(row11[0])
row14.append(row12[0])
row14.append(row13[0])
all.append(row14)
with open('C:\dev\OTQtxt\TODAY.csv', 'wb') as output:
writer = csv.writer(output, delimiter=',')
writer.writerows(all)
我觉得在复制代码的时候,缩进可能有点问题,但你应该能明白我的意思。而且我也不指望你能逐行阅读,代码确实很重复。
我看到有一些类似的问题推荐使用unix
工具。如果有人打算建议这个,我得先说一下,我是在Windows上运行这个。
如果有人有办法帮我整理一下代码,让它真正能工作,我会非常感激!
3 个回答
0
我刚刚测试过:
import csv
import glob
files = glob.glob1("C:\\dev\\OTQtxt", "*csv")
rows=[]
with open('C:\\dev\\OTQtxt\\one.csv', 'a') as oneFile:
for file in files:
rows.append(open("C:\\dev\\OTQtxt\\" + file, 'r').read().splitlines())
for row in rows:
writer = csv.writer(oneFile)
writer.writerow(''.join(row))
这段代码会在你的文件夹里生成一个叫 one.csv
的文件,里面会包含所有合并后的 *csv 文件的内容。
0
你可以这样做,文件的名字也可以在一个循环中指定:
import numpy as np
filenames = ['file1', 'file2', 'file3'] # all the files to be read in
data = [] # saves data from the files
for filename in filenames:
data.append(open(filename, 'r').readlines()) # append a list of all numbers in the current file
data = np.matrix(data).T # transpose the list of list using numpy
data_string = '\n'.join([','.join([k.strip() for k in j]) for j in data.tolist()]) # create a string by separating inner elements by ',' and outer list by '\n'
with open('newfile', 'w') as fp:
fp.write(data_string)
2
创建文件:
xxxx@xxxx:/tmp/files$ for i in {1..15}; do echo -e "1\n2\n3\n4" > "my_csv_$i.csv"; done
xxxx@xxxx:/tmp/files$ more my_csv_1.csv
1
2
3
4
xxxx@xxxx:/tmp/files$ ls
my_csv_10.csv my_csv_11.csv my_csv_12.csv my_csv_13.csv my_csv_14.csv my_csv_15.csv my_csv_1.csv my_csv_2.csv my_csv_3.csv my_csv_4.csv my_csv_5.csv my_csv_6.csv my_csv_7.csv my_csv_8.csv my_csv_9.csv
使用 itertools.izip_longest
:
with open('result.csv', 'w') as f_obj:
rows = []
files = os.listdir('.')
for f in files:
rows.append(open(f).readlines())
iter = izip_longest(*rows)
for row in iter:
f_obj.write(','.join([field.strip() for field in row if field is not None])+'\n')
输出:
xxxxx@xxxx:/tmp/files$ more result.csv
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
2,2,2,2,2,2,2,2,2,2,2,2,2,2,2
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3
4,4,4,4,4,4,4,4,4,4,4,4,4,4,4
这不是最好的解决方案,因为这样会把所有数据都放在内存里。不过你应该能明白怎么做。顺便说一下,如果你的数据都是数字,我建议你使用 numpy
,并尝试使用多维数组。