如何从下载的CSV文件中提取特定数据并转置到新CSV文件中?

-1 投票
3 回答
1787 浏览
提问于 2025-04-18 09:48

我正在使用一个在线调查应用程序,可以把调查结果下载成csv文件。不过,下载下来的csv文件格式是每个调查问题和答案都放在新的一列,而我需要的是每个调查问题和答案放在新的一行。此外,下载的csv文件里还有很多我想完全忽略的数据。

我该怎么从下载的csv文件中提取出我需要的行和列,并把它们写入一个特定格式的新csv文件呢?

举个例子,我下载的数据看起来是这样的:

V1,V2,V3,Q1,Q2,Q3,Q4....
null,null,null,item,item,item,item....
0,0,0,4,5,4,5.... 
0,0,0,2,3,2,3....

第一行包含了我需要的“键”,不过V1到V3必须排除。第二行要完全排除。第三行是我的第一个主题,所以我需要把4,5,4,5这些值和Q1,Q2,Q3,Q4这些键配对。而第四行是一个新主题,也需要排除,因为我的程序一次只能处理一个主题。

我需要创建的csv文件,才能让我的脚本正常工作,应该是这样的:

Q1,4
Q2,5
Q3,4
Q4,5

我试过用这个izip来转换数据,但我不知道怎么具体选择我需要的行和列:

from itertools import izip
a = izip(*csv.reader(open("CDI.csv", "rb")))
csv.writer(open("CDI_test.csv", "wb")).writerows(a)

3 个回答

0

我建议你了解一下pandas,这个工具非常适合做这种事情:

http://pandas.pydata.org/pandas-docs/stable/io.html

import pandas

input_dataframe = pandas.read_csv("input.csv")
transposed_df = input_dataframe.transpose()

# delete rows and edit data easily using pandas dataframe
# this is a good library to get some experience working with

transposed_df.to_csv("output.csv")
0

我做过类似的事情,使用了多个值,不过也可以改成用单个值。

 #!/usr/bin/env python


import csv

def dict_from_csv(filename):
    '''
    (file)->list of dictionaries
    Function to read a csv file and format it to a list of dictionaries.
    The headers are the keys with all other data becoming values
    The format of the csv file and the headers included need to be know to extract the email addresses
    '''

    #open the file and read it using csv.reader()
    #read the file. for each row that has content add it to list mf
    #the keys for our user dict are the first content line of the file mf[0]
    #the values to our user dict are the other lines in the file mf[1:]
    mf = []
    with open(filename, 'r') as f:
        my_file = csv.reader(f)
        for row in my_file:
            if any(row):
                mf.append(row)
    file_keys = mf[0]
    file_values= mf[1:]  #choose row/rows you want

    #Combine the two lists, turning into a list of dictionaries, using the keys list as the key and the people list as the values
    my_list = []
    for value in file_values:
        my_list.append(dict(zip(file_keys, file_values)))

    #return the list of dictionaries
    return my_list
1

这里有一个简单的Python脚本,可以帮你完成这个任务。它可以从命令行接收一些参数,这些参数指定了你想要在行首跳过多少个条目、在行尾跳过多少个条目,以及输入文件和输出文件的名称。比如说,命令可能看起来像这样:

python question.py 3:7 input.txt output.txt

如果你不想每次都输入这些参数,也可以在脚本中把 sys.argv[1] 替换成3,把 sys.argv[2] 替换成 "input.txt" 等等。

文本文件版本:

import sys

inputFile = open(sys.argv[2],"r")
outputFile = open(sys.argv[3], "w")
leadingRemoved=int(sys.argv[1])

#strips extra whitespace from each line in file then splits by ","
lines = [x.strip().split(",") for x in inputFile.readlines()]
#zips all but the first x number of elements in the first and third row
zipped = zip(lines[0][leadingRemoved:],lines[2][leadingRemoved:])
for tuples in zipped:
    #writes the question/ number pair to a file. 
    outputFile.write(",".join(tuples))

inputFile.close()
outputFile.close()

#input from command line: python questions.py leadingRemoved pathToInput pathToOutput

CSV文件版本:

import sys
import csv


with open(sys.argv[2],"rb") as inputFile:
    #removes null bytes
    reader = csv.reader((line.replace('\0','') for line in inputFile),delimiter="\t")
    outputFile = open(sys.argv[3], "wb")
    leadingRemoved,endingremoved=[int(x) for x in sys.argv[1].split(":")]
    #creates a 2d array of all the elements for each row
    lines = [x for x in reader]
    print lines
    #zips all but the first x number of elements in the first and third row
    zipped = zip(lines[0][leadingRemoved:endingremoved],lines[2][leadingRemoved:endingremoved])
    writer = csv.writer(outputFile)
    writer.writerows(zipped)
    print zipped
    outputFile.close()  

撰写回答