如何以命名元组的形式逐行读取CSV文件数据?

38 投票
3 回答
17821 浏览
提问于 2025-04-17 11:04

怎样才能把一个包含标题行的数据文件读取到一个命名元组中,这样我们就可以通过标题名称来访问数据行呢?

我尝试了类似这样的代码:

import csv
from collections import namedtuple

with open('data_file.txt', mode="r") as infile:
    reader = csv.reader(infile)
    Data = namedtuple("Data", ", ".join(i for i in reader[0]))
    next(reader)
    for row in reader:
        data = Data(*row)

但是,读取器对象不能像列表那样使用索引,所以上面的代码会报一个 TypeError 错误。那么,用Python的方式来把文件的标题读取到命名元组中,应该怎么做呢?

3 个回答

0

我建议你可以试试这个方法:

import csv
from collections import namedtuple

with open("data.csv", 'r') as f:
        reader = csv.reader(f, delimiter=',')
        Row = namedtuple('Row', next(reader))
        rows = [Row(*line) for line in reader]

如果你使用Pandas库,这个解决方案会更加简洁优雅:

import pandas as pd
from collections import namedtuple

data = pd.read_csv("data.csv")
Row = namedtuple('Row', data.columns)
rows = [Row(*row) for index, row in data.iterrows()]

在这两种情况下,你都可以通过字段名称来操作记录:

for row in rows:
    print(row.foo)
30

请看看csv.DictReader。简单来说,它可以让你从第一行获取列名,这正是你想要的。之后,你可以通过名字来访问每一行的列,就像使用字典一样。

如果你出于某种原因仍然需要以collections.namedtuple的形式访问这些行,其实把字典转换成命名元组也很简单,方法如下:

with open('data_file.txt') as infile:
    reader = csv.DictReader(infile)
    Data = collections.namedtuple('Data', reader.fieldnames)
    tuples = [Data(**row) for row in reader]
49

使用:

Data = namedtuple("Data", next(reader))

并且省略这一行:

next(reader)

结合下面martineau的评论,迭代版本的例子在Python 2中变成:

import csv
from collections import namedtuple
from itertools import imap

with open("data_file.txt", mode="rb") as infile:
    reader = csv.reader(infile)
    Data = namedtuple("Data", next(reader))  # get names from column headers
    for data in imap(Data._make, reader):
        print data.foo
        # ...further processing of a line...

而在Python 3中则是:

import csv
from collections import namedtuple

with open("data_file.txt", newline="") as infile:
    reader = csv.reader(infile)
    Data = namedtuple("Data", next(reader))  # get names from column headers
    for data in map(Data._make, reader):
        print(data.foo)
        # ...further processing of a line...

撰写回答