计算列表中给定其他项的出现次数

2024-04-24 23:02:41 发布

您现在位置:Python中文网/ 问答频道 /正文

我编写了以下python代码来解析.csv文件并打印两列,date和rating。现在我想根据日期计算评级,例如,如果2018-4-01出现4次,评级1,4,1,4我想打印

2018-4-01 1 2
2018-4-01 4 2

我试过的代码

import glob
import csv
import re
from collections import Counter
path = "ReviewsSep2018/*.csv"
mylist = []
    for filename in glob.glob(path):
    print(filename)
    with open(filename, newline='', encoding='utf-16') as f:
        reader = csv.reader(f)
        for row in reader:
            result = re.search(r'\d+\W\d+\W\d+', row[5])
            if result:
                line = result.group()
                mylist.append(tuple([line,row[9]]))
        print(mylist)
for i in mylist:
    print(i[0],i[1])

代码示例的输出

2018-09-01 1
2018-09-01 5
2018-09-01 2
2018-09-01 1
2018-08-23 1
2018-09-01 4
2018-09-01 4
2018-09-01 5
2018-09-01 2
2018-09-02 1
2018-09-02 5
2018-09-02 5

期望结果

date       star   count
2018-09-01   1        2
2018-09-01   2        3
2018-09-01   5        2
2018-09-02   5        2
2018-08-23   1        1

Tags: csvpath代码inimportrefordate
2条回答

把你的mylist变成Counter

mycount = Counter()

而不是附加到(date, rating)元组的列表增量计数:

mycount[(line,row[9])] += 1

最后,将其显示为:

for (date, rating), count in mycount.items():
    print(date, rating, count)

如果您不介意使用pandas库,您可以在解析数据之后使用groupby。在我看来,pandas还有一个很好的.csv阅读功能。你知道吗

import pandas as pd

(pd.DataFrame([['2018-09-01', 1],
              ['2018-09-01', 5],
              ['2018-09-01', 2],
              ['2018-09-01', 1],
              ['2018-08-23', 1],
              ['2018-09-01', 4],
              ['2018-09-01', 4],
              ['2018-09-01', 5],
              ['2018-09-01', 2],
              ['2018-09-02', 1],
              ['2018-09-02', 5],
              ['2018-09-02', 5]],
             columns=['date', 'star']
            )
 .assign(count=1)
 .groupby(['date', 'star'])
 .count()
)

相关问题 更多 >