Python3,字典从csv文件中统计词频

2024-05-23 19:22:43 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试编写一个函数,读取不同学位的学生志愿者的CSV文件。函数的目的是创建一个字典,其中键是度,值是度的频率。在

数据组织如下;

name    degree     email

ABC     PhD.       abd@gmail.com
CDE     Ph.D.      cde@gmail.com
FGH     MD,PHD     fgh@gmail.com

按如下方式准备一本词典:

^{pr2}$

Tags: 文件csv函数name目的com字典email
2条回答

我相信你的问题是下面的代码没有效果,除非你把它赋给一个变量。在

[word.replace(".", "") for word in student_degree_list]
[word.lower() for word in student_degree_list]

另外,如果一个度数有1次出现,它不应该设置为1而不是0吗?在

工作代码:

^{pr2}$
import csv 
from collections import Counter

columns = defaultdict(list) # each value in each column is appended to a list

with open('csv_file.csv') as f:
    reader = csv.DictReader(f) # read rows into a dictionary format
    for row in reader: # read a row as {column1: value1, column2: value2,...}
        for (k,v) in row.items(): # go over each column name and value 
            columns[k].append(v) # append the value into the appropriate list
                                 # based on column name k

csv reader code的积分

^{pr2}$

选项1

output_dict_counter_version = dict(Counter(degree_list_clean))
print(output_dict_counter_version)

选项2

degree_frequency_dict = {}

for deg in degree_list_clean:
    if deg in degree_frequency_dict:
        degree_frequency_dict[deg] += 1
    else:
        degree_frequency_dict[deg] = 1

print(degree_frequency_dict)    

使用

import pandas as pd
from collections import Counter

data = pd.read_csv("csv_file.csv")
degree_list = data['degree'].tolist()


degree_list_clean = []

for cad_degrees in degree_list:
    cad_degrees_lst = cad_degrees.split()
    for degree in cad_degrees_lst:
        degree_clean = degree.strip().replace('.','').lower()
        degree_list_clean.append(degree_clean)

print(dict(Counter(degree_list_clean)))



'''
          Input
name,degree,email
ABC,PhD. ,abd@gmail.com
CDE,Ph.D. ,cde@gmail.com
FGH, MD PHD ,fgh@gmail.com

           Output
{'phd': 3, 'md': 1}
'''

相关问题 更多 >