使用Python字典进行表格/数据操作
我需要帮助来完成这个Python脚本。我在一家公司实习,这是我的第一周。他们让我开发一个Python脚本,这个脚本需要读取一个.csv文件,并把相关的列合并到一列中,这样最终只会有大约15列必要的数据。例如,如果有zip4、zip5或邮政编码的列,他们希望这些都能放到一个叫“邮政编码”的列下面。
我这周刚开始学习Python,所以请原谅我提的问题很基础,词汇也不太专业。我并不是想让你们帮我做这个,我只是想要一些指导。实际上,我想更多地了解Python,所以如果有人能给我指个方向,我会非常感激。
我在使用字典的键和值。键是第一行的每一列,值是每个键对应的剩余行(从第二行到大约3000行)。现在,我只得到了一个键值对。我只得到了最后一行作为我的值数组,而且只得到了一个键。此外,我还收到了一个KeyError的错误信息,这意味着我的键没有被正确识别。到目前为止,我的代码在下面。我会继续努力,如果有人能帮我,我会非常感激!希望我能请帮我的人喝一杯,顺便请教一下他们的经验:)
谢谢你的时间
# To be able to read csv formated files, we will frist have to import the csv module
import csv
# cols = line.split(',')# each column is split by a comma
#read the file
CSVreader = csv.reader(open('N:/Individual Files/Jerry/2013 customer list qc, cr, db, gb 9-19-2013_JerrysMessingWithVersion.csv', 'rb'), delimiter=',', quotechar='"')
# define open dictionary
SLSDictionary={}# no empty dictionary. Need column names to compare to.
i=0
#top row are your keys. All other rows are your values
#adjust loop
for row in CSVreader:
# mulitple loops needed here
if i == 0:
key = row[i]
else:
[values] = [row[1:]]
SLSDictionary = dict({key: [values]}) # Dictionary is keys and array of values
i=i+1
#print Dictionary to check errors and make sure dictionary is filled with keys and values
print SLSDictionary
# SLSDictionary has key of zip/phone plus any characters
#SLSDictionary.has_key('zip.+')
SLSDictionary.has_key('phone.+')
#value of key are set equal to x. Values of that column set equal to x
#[x]=value
#IF SLSDictionary has the key of zip plus any characters, move values to zip key
#if true:
# SLSDictionary['zip'].append([x])
#SLSDictionary['phone_home'].append([value]) # I need to append the values of the specific column, not all columns
#move key's values to correct, corresponding key
SLSDictionary['phone_home'].append(SLSDictionary[has_key('phone.+')])#Append the values of the key/column 'phone plus characters' to phone_home key/column in SLSDictionary
#if false:
# print ''
# go to next key
SLSDictionary.has_value('')
if true:
print 'Error: No data in column'
# if there's no data in rows 1-?. Delete column
#if value <= 0:
# del column
print SLSDictionary
1 个回答
我快速看了一下,发现了几个错误。你需要注意的是,每次都在给现有的字典赋新值:
SLSDictionary = dict({key: [values]})
每次进入那个循环时,你都在重新给SLSDictionary赋值。所以到最后,你只会得到最底下的那个条目。要想往字典里添加一个键,你可以这样做:
SLSDictionary[key] = values
另外,这行代码里的括号其实是不需要的:
[values] = [row[1:]]
应该改成这样:
values = row[1:]
但最重要的是,你只会有一个键,因为你一直在增加i的值。所以字典里只会有一个键,所有的东西都会不断地被赋值给它。如果没有CSV的样本,我就无法指导你如何重构这个循环,以便能捕捉到所有的键。
假设你的CSV看起来是这样的,正如你描述的那样:
Col1, Col2, Col3, Col4
Val1, Val2, Val3, Val4
Val11, Val22, Val33, Val44
Val111, Val222, Val333, Val444
那么你可能想要这样的代码:
dummy = [["col1", "col2", "col3", "col4"],
["val1", "val2", "val3", "val4"],
["val11", "val22", "val33", "val44"],
["val111", "val222", "val333", "val444"]]
column_index = []
SLSDictionary = {}
for each in dummy[0]:
column_index.append(each)
SLSDictionary[each] = []
for each in dummy[1:]:
for i, every in enumerate(each):
try:
if column_index[i] in SLSDictionary.keys():
SLSDictionary[column_index[i]].append(every)
except:
pass
print SLSDictionary
这样会得到...
{'col4': ['val4', 'val44', 'val444'], 'col2': ['val2', 'val22', 'val222'], 'col3': ['val3', 'val33', 'val333'], 'col1': ['val1', 'val11', 'val111']}
如果你想让它们保持顺序,可以把字典的类型改成OrderedDict()