将python中的csv转为字典

10 投票
4 回答
28177 浏览
提问于 2025-04-15 16:59

我刚开始学习Python,想创建一个类,把CSV文件里的数据加载到一个字典里。

我希望能控制字典里的键和值。比如说,我可以随时用worker1.name或者worker1.age来获取数据。

class ageName(object):
'''class to represent a person'''
def __init__(self, name, age):
self.name = name
self.age = age

worker1 = ageName('jon', 40)
worker2 = ageName('lise', 22)

#Now if we print this you see that it`s stored in a dictionary
print worker1.__dict__
print worker2.__dict__
#
'''
{'age': 40, 'name': 'jon'}
#
{'age': 22, 'name': 'lise'}
#
'''
#

#when we call (key)worker1.name we are getting the (value)
print worker1.name
#
'''
#
jon
#
'''

但是我在把CSV数据加载到键和值的时候遇到了困难。

[1] 我想自己定义键,像这样:worker1 = ageName([name],[age],[id],[gender])

[2] 这里的[name]、[age]、[id]和[gender]都是来自CSV文件中特定列的数据。

我真的不知道该怎么做。我尝试了很多方法,但都失败了。我需要一些帮助来开始这个项目。

---- 编辑

这是我最初的代码

import csv

# let us first make student an object

class Student():
    def __init__(self):
        self.fname = []
        self.lname = []
        self.ID = []
        self.sport = []
        # let us read this file
        for row in list(csv.reader(open("copy-john.csv", "rb")))[1:]:
            self.fname.append(row[0])
            self.lname.append(row[1])   
            self.ID.append(row[2])
            self.sport.append(row[3])
    def Tableformat(self):
        print "%-14s|%-10s|%-5s|%-11s" %('First Name','Last Name','ID','Favorite Sport')
        print "-" * 45
        for (i, fname) in enumerate(self.fname):
           print "%-14s|%-10s|%-5s|%3s" %(fname,self.lname[i],self.ID[i],self.sport[i])
    def Table(self):
        print self.lname

class Database(Student):
    def __init__(self):
        g = 0
        choice = ['Basketball','Football','Other','Baseball','Handball','Soccer','Volleyball','I do not like sport']
        data = student.sport
        k = len(student.fname)
        print k
        freq = {}
        for i in data:
            freq[i] = freq.get(i, 0) + 1
        for i in choice:
            if i not in freq:
                freq[i] = 0
            print i, freq[i]


student = Student()
database = Database()

这是我目前的代码(还不完整)

import csv
class Student(object):
    '''class to represent a person'''
    def __init__(self, lname, fname, ID, sport):
        self.lname = lname
        self.fname = fname
        self.ID = ID
        self.sport = sport
reader = csv.reader(open('copy-john.csv'), delimiter=',', quotechar='"')
student = [Student(row[0], row[1], row[2], row[3]) for row in reader][1::]
print "%-14s|%-10s|%-5s|%-11s" %('First Name','Last Name','ID','Favorite Sport')
print "-" * 45
for i in range(len(student)):
    print "%-14s|%-10s|%-5s|%3s" %(student[i].lname,student[i].fname,student[i].ID,student[i].sport)

choice = ['Basketball','Football','Other','Baseball','Handball','Soccer','Volleyball','I do not like sport']
lst = []
h = 0
k = len(student)
# 23
for i in range(len(student)):
    lst.append(student[i].sport) # merge together

for a in set(lst):
    print a, lst.count(a)

for i in set(choice):
    if i not in set(lst):
        lst.append(i)
        lst.count(i) = 0
        print lst.count(i)

4 个回答

8

我也支持马克的建议。特别是,可以看看csv模块里的DictReader,它可以把用逗号分隔(或者一般来说是用其他符号分隔)的文件当作字典来读取。

你可以查看PyMotW对csv模块的介绍,里面有快速参考和使用DictReader、DictWriter的例子。

9

我知道这个问题已经有点老了,但看到这个问题,真的很难不想到一个很棒的新(其实也不算太新)Python库,叫做 pandas。它的主要分析单位是一个叫做 DataFrame 的东西,这个概念是模仿 R 处理数据的方式。

假设你有一个(非常简单的)csv文件,叫做 example.csv,它的内容看起来是这样的:

day,fruit,sales
Monday,Banana,10
Monday,Orange,20
Tuesday,Banana,12
Tuesday,Orange,22

如果你想快速读取一个csv文件,并对它进行一些操作,下面这段代码在简洁性和易用性上都几乎无与伦比:

>>> import pandas as pd
>>> csv = pd.read_csv('example.csv')
>>> csv
       day   fruit  sales
0   Monday  Banana     10
1   Monday  Orange     20
2  Tuesday  Banana     12
3  Tuesday  Orange     22
>>> csv[csv.fruit=='Banana']
       day   fruit  sales
0   Monday  Banana     10
2  Tuesday  Banana     12
>>> csv[(csv.fruit=='Banana') & (csv.day=='Monday')]
      day   fruit  sales
0  Monday  Banana     10

在我看来,这真的是太棒了。再也不用一个一个地处理 csv.reader 对象了!

12
import csv

reader = csv.reader(open('workers.csv', newline=''), delimiter=',', quotechar='"')
workers = [ageName(row[0], row[1]) for row in reader]

现在,workers 有了所有工人的列表

>>> workers[0].name
'jon'

在问题修改后添加的编辑

你为什么还在使用旧式类?我这里用的是新式类。

class Student:
    sports = []
    def __init__(self, row):
       self.lname, self.fname, self.ID, self.sport = row
       self.sports.append(self.sport)
    def get(self):
       return (self.lname, self.fname, self.ID, self.sport)

reader = csv.reader(open('copy-john.csv'), delimiter=',', quotechar='"')
print "%-14s|%-10s|%-5s|%-11s" % tuple(reader.next()) # read header line from csv
print "-" * 45
students = list(map(Student, reader)) # read all remaining lines
for student in students:
    print "%-14s|%-10s|%-5s|%3s" % student.get()

# Printing all sports that are specified by students
for s in set(Student.sports): # class attribute
    print s, Student.sports.count(s)

# Printing sports that are not picked 
allsports = ['Basketball','Football','Other','Baseball','Handball','Soccer','Volleyball','I do not like sport']
for s in set(allsports) - set(Student.sports):
    print s, 0

希望这能给你一些关于 Python 序列强大功能的启发。 ;)

编辑 2,尽量缩短... 只是为了炫耀 :P

女士们,先生们,7(.5)行。

allsports = ['Basketball','Football','Other','Baseball','Handball',
             'Soccer','Volleyball','I do not like sport']
sports = []
reader = csv.reader(open('copy-john.csv'))
for row in reader:
    if reader.line_num: sports.append(s[3])
    print "%-14s|%-10s|%-5s|%-11s" % tuple(s)
for s in allsports: print s, sports.count(s)

撰写回答