如何在不导致属性错误的情况下按值对列表进行分组

2024-05-29 00:20:27 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个CSV,OutputA,格式如下:

Position,Category,Name,Team,Points
1,A,James,Team 1,100
2,A,Mark,Team 2,95
3,A,Tom,Team 1,90

我试图得到一个CSV的输出,它可以得到每个车队的总分、每个车队的平均分和车手的数量

因此,产出将是:

Team,Points,AvgPoints,NumOfRiders
Team1,190,95,2
Team2,95,95,1

我有一个函数可以将每一行转换为一个namedtuple:

fields = ("Position", "Category", "Name", "Team", "Points")
Results = namedtuple('CategoryResults', fields)

def csv_to_tuple(path):
    with open(path, 'r', errors='ignore') as file:
        reader = csv.reader(file)
        for row in map(Results._make, reader):
            yield row

然后,这会将这些行按顺序排序到一个已排序的列表中:

moutputA = sorted(list(csv_to_tuple("Male/outputA.csv")), key=lambda k: k[3])

这将返回如下列表:

[CategoryResults(Position='13', Category='A', Name='Marek', Team='1', Points='48'), CategoryResults(Position='7', Category='A', Name='', Team='1', Points='70')]

虽然我可能错了,但我相信到目前为止这是正确的

我正在尝试创建一个新的团队列表,其中包含分数(尚未加起来)

例如:

[Team 1(1,2,3,4,5)]
[Team 2 (6,9,10)]
etc.

我的想法是,我可以找到有多少独特的点值(这等于车手的数量)。但是,当尝试对列表进行分组时,我有以下代码:

Clubs = []
Club_Points = []
for Names, Club in groupby(moutputA, lambda x: x[3]):
    for Teams in Names:
        Clubs.append(list(Teams))

for Club, Points in groupby(moutputA, lambda x: x[4]):
    for Point in Clubs:
        Club_Points.append(list(Point))

print(Clubs)

但这再次说明了这个错误:

    Teams.append(list(Team))
AttributeError: 'itertools._grouper' object has no attribute 'append'

Tags: csvnamein列表forpositionteampoints
3条回答

只要使用pandas,所有这些都会变得更容易。查看下面的代码

import pandas as pd
import numpy as np

df = pd.read_csv(input_path)

teams = list(set(df['Team'])) # unique list of all the teams
num_teams = len(teams)

points = np.empty(shape=num_teams)
avg_points = np.empty(shape=num_teams)
num_riders = np.empty(shape=num_teams)

for i in range(num_teams):
    # find all rows where the entry in the 'Team' column
    # is the same as teams[i]
    req = df.loc[df['Team'] == teams[i]]
    points[i] = np.sum(req['Points'])
    num_riders[i] = len(req)
    avg_points[i] = point[i]/num_riders[i]

dict_out = {
    'Team':teams,
    'Points':points,
    'AvgPoints':avg_points,
    'NumOfRiders':num_riders
}
df_out = pd.DataFrame(data=dict_out)
df_out.to_csv(output_path)

如果data.csv包含:

Position,Category,Name,Team,Points
1,A,James,Team 1,100
2,A,Mark,Team 2,95
3,A,Tom,Team 1,90

然后这个脚本:

import csv
from collections import namedtuple
from itertools import groupby
from statistics import mean

fields = ("Position", "Category", "Name", "Team", "Points")
Results = namedtuple('CategoryResults', fields)

def csv_to_tuple(path):
    with open(path, 'r', errors='ignore') as file:
        next(file) # skip header
        reader = csv.reader(file)
        for row in map(Results._make, reader):
            yield row

moutputA = sorted(csv_to_tuple("data.csv"), key=lambda k: k.Team)

out = []
for team, group in groupby(moutputA, lambda x: x.Team):
    group = list(group)
    d = {}
    d['Team'] = team
    d['Points'] = sum(int(i.Points) for i in group)
    d['AvgPoints'] = mean(int(i.Points) for i in group)
    d['NumOfRider'] = len(group)
    out.append(d)


with open('data_out.csv', 'w', newline='') as csvfile:
    fieldnames = ['Team', 'Points', 'AvgPoints', 'NumOfRider']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

    writer.writeheader()
    for row in out:
        writer.writerow(row)

产生data_out.csv

Team,Points,AvgPoints,NumOfRider
Team 1,190,95,2
Team 2,95,95,1

LibreOffice的屏幕截图:

enter image description here

这是一个开始。你应该能够想出如何从中得到你想要的

import csv, io
from collections import namedtuple
from itertools import groupby

data = '''\
Position,Category,Name,Team,Points
1,A,James,Team 1,100
2,A,Mark,Team 2,95
3,A,Tom,Team 1,90
'''

b = io.StringIO(data)
next(b)

fields = ("Position", "Category", "Name", "Team", "Points")
Results = namedtuple('CategoryResults', fields)


def csv_to_tuple(file):
    reader = csv.reader(file)
    for row in map(Results._make, reader):
        yield row


rows = sorted(list(csv_to_tuple(b)), key=lambda k: k[3])

for TeamName, TeamRows in groupby(rows, lambda x: x[3]):
    print(TeamName)
    TeamPoints = [row.Points for row in TeamRows]
    print(TeamPoints)
    print()

相关问题 更多 >

    热门问题