Python csv 根据另一列的名称统计一列中的项目数量
我刚开始学习Python编程。我有一个很大的CSV文件(大约有5000条数据)。里面有两列数据需要我进行统计。最简单的方式就是给你展示一下这个CSV文件的几行内容:
Name column OPTIONALDATA5 column
Collaborative Desk Broward
Collaborative Desk Broward
Academic Desk Broward
Academic Desk Broward
Academic Desk Broward
Academic Desk Broward
Collaborative Desk Broward
Collaborative Desk Broward
Collaborative Desk Broward
Collaborative Desk Broward
Broward Broward
Alachua Alachua
Collaborative Desk Alachua
Collaborative Desk Alachua
Collaborative Desk Alachua
Collaborative Desk Alachua
Collaborative Desk Alachua
在上面的例子中,我想要的结果是这样的:
Broward:
collaborative Desk - 6
Academic Desk - 4
Broward - 1
Alachua:
collaborative Desk - 5
Alachua - 1
也许还需要一个总计,然后再继续处理电子表格中的下一个库。
我开始写代码了,但我在想是不是还有更好的方法来完成这个任务。
2 个回答
1
这个方法也可以用(假设你的文件是用 \t
来分隔的):
import itertools
import operator
import csv
import collections
results = collections.defaultdict(lambda: collections.defaultdict(int))
with open('sample.csv', 'r') as f_in:
f_in.seek(0)
rdr = csv.reader(f_in, delimiter='\t')
next(rdr)
for row in rdr:
results[row[1]][row[0]] += 1
for k, v in results.iteritems():
print "%s" % k
for k2, v2 in v.iteritems():
print " %s - %s" % (k2, v2)
输出结果:
Alachua
Alachua - 1
Collaborative Desk - 5
Broward
Collaborative Desk - 6
Academic Desk - 4
Broward - 1
3
假设数据是用制表符分隔的,这是一种获取你想要的结果的方法:
import csv
from collections import defaultdict, Counter
input_file = open('data')
csv_reader = csv.reader(input_file, delimiter='\t')
data = defaultdict(list)
for row in csv_reader:
data[row[1]].append(row[0])
现在数据将包含:
{'Alachua': ['Alachua', 'Collaborative Desk', 'Collaborative Desk', 'Collaborative Desk', 'Collaborative Desk', 'Collaborative Desk'],
'Broward': ['Collaborative Desk', 'Collaborative Desk', 'Academic Desk', 'Academic Desk', 'Academic Desk', 'Academic Desk', 'Collaborative Desk', 'Collaborative Desk', 'Collaborative Desk', 'Collaborative Desk', 'Broward']}
你可以遍历每个键的值列表,计算总数,或者在Python中使用Counter
方法,如下所示:
for k, v in data.items():
print k
print Counter(v)
这将输出:
Alachua
Counter({'Collaborative Desk': 5, 'Alachua': 1})
Broward
Counter({'Collaborative Desk': 6, 'Academic Desk': 4, 'Broward': 1})