如何找到csv文件中特定数字的平均值?

2024-04-26 20:56:21 发布

您现在位置:Python中文网/ 问答频道 /正文

with open('sortedsimpsons_episodes.csv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    print("Season 1")
    for idx,row in enumerate(csv_reader):
        if idx>=1 and idx<=13:
            print(f'"{row[1]}" is an episode in season {row[4]}, that has {row[7]} million views and an imdb rating of {row[9]}')

viewsAverage = round((30.3 + 30.4 + 27.6 + 33.5 + 31.2 + 27.1 + 26.7 + 25.4 + 20.2 + 27.4 + 28 + 27.1 + 27.5) / 13,2)
imdbAverage = round((7.4 + 8.3 + 7.9 + 7.5 + 7.8 + 7.9 + 8.2 + 7.8 + 7.8 + 7.6 + 7.7 + 8.1 + 7.5) / 13,2)
print("The average amount of views in season 1 is: "+str(viewsAverage)+ " million.")
print("The average imdb rating of season 1 is: " +str(imdbAverage))
csv_file.close()

CSV文件:

"Krusty Gets Busted" is an episode in season 1, that has 30.4 million views and an imdb rating of 8.3.
"The Call of the Simpsons" is an episode in season 1, that has 27.6 million views and an imdb rating of 7.9.
"Life on the Fast Lane" is an episode in season 1, that has 33.5 million views and an imdb rating of 7.5.
"The Crepes of Wrath" is an episode in season 1, that has 31.2 million views and an imdb rating of 7.8.
"Some Enchanted Evening" is an episode in season 1, that has 27.1 million views and an imdb rating of 7.9.
"Simpsons Roasting on an Open Fire" is an episode in season 1, that has 26.7 million views and an imdb rating of 8.2.
"Bart the Genius" is an episode in season 1, that has 24.5 million views and an imdb rating of 7.8.
"There's No Disgrace Like Home" is an episode in season 1, that has 26.2 million views and an imdb rating of 7.8.
"Moaning Lisa" is an episode in season 1, that has 27.4 million views and an imdb rating of 7.6.
"The Telltale Head" is an episode in season 1, that has 28 million views and an imdb rating of 7.7.
"Bart the General" is an episode in season 1, that has 27.1 million views and an imdb rating of 8.1.
"Homer's Odyssey" is an episode in season 1, that has 27.5 million views and an imdb rating of 7.5.
"Bart Gets an "F"" is an episode in season 2, that has 33.6 million views and an imdb rating of 8.2.
"Two Cars in Every Garage and Three Eyes on Every Fish" is an episode in season 2, that has 26.1 million views and an imdb rating of 8.1.
"Dead Putting Society" is an episode in season 2, that has 25.4 million views and an imdb rating of 8.
"Bart the Daredevil" is an episode in season 2, that has 26.2 million views and an imdb rating of 8.4.

用python打印整个文件时,它很长。它持续了27个赛季。我想找到平均的意见和评级为每个赛季,我只知道如何手动做,如上面的代码所示。代码可以正常工作并打印出我想要的内容,但这样做会花费我很长时间。如何在不手动输入所有数字的情况下找到一个季节的平均视图?你知道吗


Tags: andofcsvinanthatisviews
3条回答

要找到每个季节的视图和评级的平均值,首先需要按季节对一组行进行排序。你知道吗

我假设:

  • 第[1]行是标题
  • 第[4]排是季节
  • 第[7]行是视图数
  • 第[9]行是速率。你知道吗

所以,我想象你有这样的东西(我用None替换了未知值):

rows = [
    ('title1', None, None, None, 1, None, None, 30.4, None, 8.5),
    ('title2', None, None, None, 2, None, None, 27.5, None, 6.5),
    ('title3', None, None, None, 1, None, None, 40.2, None, 4.0),
    ('title4', None, None, None, 1, None, None, 21.9, None, 2.6),
]

要对行进行排序和分组,并从行中提取值,可以使用operator.itemgetter,如下所示:

import operator

get_season = operator.itemgetter(4)
get_views = operator.itemgetter(7)
get_rate = operator.itemgetter(9)

有了这个,你可以计算出平均值:

import itertools

rows.sort(key=get_season)
for season, group in itertools.groupby(rows, key=get_season):
    group = list(group)
    count = len(group)
    total_views = sum(get_views(row) for row in group)
    total_rate = sum(get_rate(row) for row in group)
    mean_views = total_views / count
    mean_rate = total_rate / count
    print(f"season {season} - views: {mean_views:.2f}, rate: {mean_rate:.2f}")

你会得到:

season 1 - views: 30.83, rate: 5.03
season 2 - views: 27.50, rate: 6.50

如另一个答案所述,您也可以使用统计模块:

import itertools
import statistics

rows.sort(key=get_season)
for season, group in itertools.groupby(rows, key=get_season):
    group = list(group)
    mean_views = statistics.mean(get_views(row) for row in group)
    mean_rate = statistics.mean(get_rate(row) for row in group)
    print(
        f"season {season} - views: {mean_views:.2f}, rate: {mean_rate:.2f}")

在你的循环中,为什么不加上一个总数并除以计数呢?你知道吗

viewsTotal = 0
imdbTotal = 0
total = 0
with open('sortedsimpsons_episodes.csv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    print("Season 1")
    for idx, row in enumerate(csv_reader):
        if idx >= 1 and idx <= 13:
            viewsTotal += float(row[7])
            imdbTotal += float(row[9])
            total = idx
            print(f'"{row[1]}" is an episode in season {row[4]}, that has {row[7]} million views and an imdb rating of {row[9]}')
viewsAverage = round(viewsTotal / total,2)
imdbAverage = round(imdbTotal / total,2)
print("The average amount of views in season 1 is: "+str(viewsAverage)+ " million.")
print("The average imdb rating of season 1 is: " +str(imdbAverage))

不确定你的低打印和平均值是否应该在csv\u文件循环后计算出来。此外,您不需要.close(),因为“with open()”会在文件完成时关闭文件。你知道吗

您可以使用字典来存储imdb收视率列表,或每个季度的观众。你知道吗

Python有一个很好的默认字典,您可以使用它为每个季节自动创建空列表:

from collections import defaultdict

ratings = defaultdict(list)
viewings = defaultdict(list)

for row in csv_reader:
    season, viewing, rating = row[4], row[7], row[9]

    ratings[season].append(rating)
    viewings[season].append(viewing)

例如,您可以获得评级列表,并计算平均值:

>>> from statistics import mean
>>> mean(ratings['season 1'])
7.807692307692307

相关问题 更多 >