使用Python按列名更新CSV文件
我有一个这样的csv文件:
product_name, product_id, category_id
book, , 3
shoe, 3, 1
lemon, 2, 4
我想用python的csv库来更新每一行的product_id,方法是通过提供列名。
举个例子,如果我传入:
update_data = {"product_id": [1,2,3]}
那么csv文件应该变成:
product_name, product_id, category_id
book, 1, 3
shoe, 2, 1
lemon, 3, 4
2 个回答
0
(假设你在使用3.x版本)
Python有一个叫做CSV的模块,它是标准库的一部分,可以帮助你读取和修改CSV文件。
使用这个模块,我会先找到你想要的那一列的索引,然后把它存储到你创建的字典里。一旦找到了这个索引,接下来就是把列表中的每一项放到每一行里。
import csv
update_data = {"product_id": [None, [1,2,3]]}
#I've nested the original list inside another so that we can hold the column index in the first position.
line_no = 0
#simple counter for the first step.
new_csv = []
#Holds the new rows for when we rewrite the file.
with open('test.csv', 'r') as csvfile:
filereader = csv.reader(csvfile)
for line in filereader:
if line_no == 0:
for key in update_data:
update_data[key][0] = line.index(key)
#This finds us the columns index and stores it for us.
else:
for key in update_data:
line[update_data[key][0]] = update_data[key][1].pop(0)
#using the column index we enter the new data into the correct place whilst removing it from the input list.
new_csv.append(line)
line_no +=1
with open('test.csv', 'w') as csvfile:
filewriter = csv.writer(csvfile)
for line in new_csv:
filewriter.writerow(line)
1
你可以用你现有的 dict
和 iter
来按顺序获取项目,比如:
import csv
update_data = {"product_id": [1,2,3]}
# Convert the values of your dict to be directly iterable so we can `next` them
to_update = {k: iter(v) for k, v in update_data.items()}
with open('input.csv', 'rb') as fin, open('output.csv', 'wb') as fout:
# create in/out csv readers, skip intial space so it matches the update dict
# and write the header out
csvin = csv.DictReader(fin, skipinitialspace=True)
csvout = csv.DictWriter(fout, csvin.fieldnames)
csvout.writeheader()
for row in csvin:
# Update rows - if we have something left and it's in the update dictionary,
# use that value, otherwise we use the value that's already in the column.
row.update({k: next(to_update[k], row[k]) for k in row if k in to_update})
csvout.writerow(row)
现在,这里假设每个新的列值会放到对应的行号上,而现有的值会在之后使用。你也可以改变这个逻辑,比如只在现有值为空的时候才使用新值(或者根据你想要的其他标准)。