在Python中计算列中单词的频率

网友

1楼 · 编辑于 2024-05-16 22:55:12

data = """Name\tHour\tLocation
A\t4\tSan Fransisco
B\t2\tNew York
C\t4\tNew York
D\t7\tDenton
E\t8\tBoston
F\t1\tBoston
"""

import csv
import StringIO
from collections import Counter


input_stream = StringIO.StringIO(data)
reader = csv.reader(input_stream, delimiter='\t')

reader.next() #skip header
cities = [row[2] for row in reader]

for (k,v) in Counter(cities).iteritems():
    print "%s appears %d times" % (k, v)

输出：

San Fransisco appears 1 times
Denton appears 1 times
New York appears 2 times
Boston appears 2 times

网友

2楼 · 编辑于 2024-05-16 22:55:12

如果文件不太大，最简单的方法是：

逐行读取文件
将location的值追加到列表中
从列表中创建一组uniques
确定列表中每个unique的计数

网友

3楼 · 编辑于 2024-05-16 22:55:12

不确定分隔的是什么，但是示例显示为4个空格，因此这是一个解决方案。

如果你真的是用制表符分隔，请使用@MariaZverina的答案

import collections

with open('test.txt') as f:
    next(f) # Skip the first line
    print collections.Counter(line.rstrip().rpartition('    ')[-1] for line in f)

输出：

Counter({'New York': 2, 'Boston': 2, 'San Fransisco': 1, 'Denton': 1})

相关问题更多 >

编程相关推荐

热门问题

热门文章

在Python中计算列中单词的频率

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >