如何以命名元组的形式逐行读取CSV文件数据?
怎样才能把一个包含标题行的数据文件读取到一个命名元组中,这样我们就可以通过标题名称来访问数据行呢?
我尝试了类似这样的代码:
import csv
from collections import namedtuple
with open('data_file.txt', mode="r") as infile:
reader = csv.reader(infile)
Data = namedtuple("Data", ", ".join(i for i in reader[0]))
next(reader)
for row in reader:
data = Data(*row)
但是,读取器对象不能像列表那样使用索引,所以上面的代码会报一个 TypeError
错误。那么,用Python的方式来把文件的标题读取到命名元组中,应该怎么做呢?
3 个回答
0
我建议你可以试试这个方法:
import csv
from collections import namedtuple
with open("data.csv", 'r') as f:
reader = csv.reader(f, delimiter=',')
Row = namedtuple('Row', next(reader))
rows = [Row(*line) for line in reader]
如果你使用Pandas库,这个解决方案会更加简洁优雅:
import pandas as pd
from collections import namedtuple
data = pd.read_csv("data.csv")
Row = namedtuple('Row', data.columns)
rows = [Row(*row) for index, row in data.iterrows()]
在这两种情况下,你都可以通过字段名称来操作记录:
for row in rows:
print(row.foo)
30
请看看csv.DictReader
。简单来说,它可以让你从第一行获取列名,这正是你想要的。之后,你可以通过名字来访问每一行的列,就像使用字典一样。
如果你出于某种原因仍然需要以collections.namedtuple
的形式访问这些行,其实把字典转换成命名元组也很简单,方法如下:
with open('data_file.txt') as infile:
reader = csv.DictReader(infile)
Data = collections.namedtuple('Data', reader.fieldnames)
tuples = [Data(**row) for row in reader]
49
使用:
Data = namedtuple("Data", next(reader))
并且省略这一行:
next(reader)
结合下面martineau的评论,迭代版本的例子在Python 2中变成:
import csv
from collections import namedtuple
from itertools import imap
with open("data_file.txt", mode="rb") as infile:
reader = csv.reader(infile)
Data = namedtuple("Data", next(reader)) # get names from column headers
for data in imap(Data._make, reader):
print data.foo
# ...further processing of a line...
而在Python 3中则是:
import csv
from collections import namedtuple
with open("data_file.txt", newline="") as infile:
reader = csv.reader(infile)
Data = namedtuple("Data", next(reader)) # get names from column headers
for data in map(Data._make, reader):
print(data.foo)
# ...further processing of a line...