使用Python处理CSV中的数据而不使用Pandas

Subject, Session, Course, Size, Category, Sprint, Jog, Walk John Doe, Session2, 17, 2, Bad, 25s, 36s, 55s John Doe, Session2, 3, 2, Good, 26s, 35s, 45s John Doe, Session2, 1, 2, Good, 22s, 31s, 47s John Doe, Session3, 5, 2, Good, 16s, 32s, 55s John Doe, Session3, 2, 2, Good, 13s, 24s, 52s John Doe, Session3, 16, 2, Bad, 15s, 26s, 49s

2条回答

网友
1楼 · 编辑于 2024-06-10 11:56:28

根据您的输入，这些内置Python库可以生成您想要的输出：
import csv from itertools import groupby from operator import itemgetter from collections import defaultdict with open('input.csv','r',newline='') as fin,open('output.csv','w',newline='') as fout: # skip needed because sample data had spaces after comma delimiters. reader = csv.DictReader(fin,skipinitialspace=True) # Output file will have these fieldnames writer = csv.DictWriter(fout,fieldnames='Subject Session Sprint Jog Walk'.split()) writer.writeheader() # for each subject/session, groupby returns a 2-tuple of sort key and an # iterator over the rows of that key. Data must be sorted by the key already! for (subject,session),group in groupby(reader,key=itemgetter('Subject','Session')): # built the row to output. defaultdict(int) assumes integer(0) if key doesn't exist. row = defaultdict(int) row['Subject'] = subject row['Session'] = session # Count the items for average. count = 0 for item in group: count += 1 # sum the rows, removing the 's' for col in ('Sprint','Jog','Walk'): row[col] += int(item[col][:-1]) # produce the average for col in ('Sprint','Jog','Walk'): row[col] /= count writer.writerow(row)
输出：
Subject,Session,Sprint,Jog,Walk John Doe,Session2,24.333333333333332,34.0,49.0 John Doe,Session3,14.666666666666666,27.333333333333332,52.0
函数链接：itemgetter groupby defaultdict
如果您的数据没有预先排序，您可以使用以下替换行读入数据，并使用groupby中使用的相同键对数据进行排序。然而，在这个实现中，数据必须足够小，以便一次将其全部加载到内存中
sortkey = itemgetter('Subject','Session') data = sorted(reader,key=sortkey) for (subject,session),group in groupby(data,key=sortkey): ...

网友
2楼 · 编辑于 2024-06-10 11:56:28

由于您希望按主题和会话对平均值进行分组，因此只需将该信息组合成唯一的键：
import csv times = {} with open('yourfile.csv', 'r') as csvfile[1:]: for row in csv.reader(csvfile, delimiter=','): key = row[0]+row[1] if key not in times.keys(): times[key] = row[-3:] else: times[key].extend(row[-3:]) average = {k: sum([int(entry[:-1]) for entry in v])/len(v) for k, v in times.items()}
这假设前两个条目确实具有与示例中相同的规则结构，并且在每行组成前两个条目时没有歧义。为了确保可以在键中的它们之间插入一个特殊的分隔符。如果您也是存储数据的人：在列标题中写入列的单位可以节省以后的转换工作，并避免冗余信息存储

相关问题更多 >

编程相关推荐

热门问题

热门文章