Python CSV - 需要根据一个键分组并计算值

4 投票

2 回答

10209 浏览

提问于 2025-04-16 13:50

我有一个简单的三列csv文件，我需要用Python来根据一个关键字对每一行进行分组，然后计算另一个关键字的平均值并返回结果。这个文件是标准的csv格式，结构如下：

ID, ZIPCODE, RATE
1, 19003, 27.50
2, 19003, 31.33
3, 19083, 41.4
4, 19083, 17.9
5, 19102, 21.40

简单来说，我需要做的是计算每个独特邮政编码（在第二列col[1]）的平均费率（在第三列col[2]）。也就是说，我要计算所有记录中邮政编码为19003、19083等等的平均费率。

我尝试过使用csv模块，把文件读入一个字典，然后根据邮政编码列中的独特值对字典进行排序，但似乎没有什么进展。

任何帮助或建议都非常感谢。

字典操作数据处理数据分析 csv 数据分组平均值计算邮政编码

2 个回答

通常，如果我需要进行复杂的处理，我会使用csv文件来加载数据到关系型数据库的表格中（sqlite是最快的方法）。然后，我会使用标准的sql方法来提取数据和计算平均值：

import csv
from StringIO import StringIO
import sqlite3

data = """1,19003,27.50
2,19003,31.33
3,19083,41.4
4,19083,17.9
5,19102,21.40
"""

f = StringIO(data)
reader = csv.reader(f)

conn = sqlite3.connect(':memory:')
c = conn.cursor()
c.execute('''create table data (ID text, ZIPCODE text, RATE real)''')
conn.commit()

for e in reader:
    e[2] = float(e[2])
    c.execute("""insert into data
          values (?,?,?)""", e)

conn.commit()

c.execute('''select ZIPCODE, avg(RATE) from data group by ZIPCODE''')
for row in c:
    print row

回答于 2025-04-16 由 Python大师

分享举报

我记录了一些步骤来帮助大家更清楚地理解事情：

import csv
from collections import defaultdict

# a dictionary whose value defaults to a list.
data = defaultdict(list)
# open the csv file and iterate over its rows. the enumerate()
# function gives us an incrementing row number
for i, row in enumerate(csv.reader(open('data.csv', 'rb'))):
    # skip the header line and any empty rows
    # we take advantage of the first row being indexed at 0
    # i=0 which evaluates as false, as does an empty row
    if not i or not row:
        continue
    # unpack the columns into local variables
    _, zipcode, level = row
    # for each zipcode, add the level the list
    data[zipcode].append(float(level))

# loop over each zipcode and its list of levels and calculate the average
for zipcode, levels in data.iteritems():
    print zipcode, sum(levels) / float(len(levels))

输出结果：

19102 21.4
19003 29.415
19083 29.65

回答于 2025-04-16 由 Python大师

分享举报

Python CSV - 需要根据一个键分组并计算值

2 个回答

撰写回答