取一些单词并打印出每个短语/单词的频率？

Beatles - Revolver (1966) Nirvana - Nevermind (1991) Beatles - Sgt Pepper's Lonely Hearts Club Band (1967) U2 - The Joshua Tree (1987) Beatles - The Beatles (1968) Beatles - Abbey Road (1969) Guns N' Roses - Appetite For Destruction (1987) Radiohead - Ok Computer (1997) Led Zeppelin - Led Zeppelin 4 (1971) U2 - Achtung Baby (1991) Pink Floyd - Dark Side Of The Moon (1973) Michael Jackson -Thriller (1982) Rolling Stones - Exile On Main Street (1972) Clash - London Calling (1979) U2 - All That You Can't Leave Behind (2000) Weezer - Pinkerton (1996) Radiohead - The Bends (1995) Smashing Pumpkins - Mellon Collie And The Infinite Sadness (1995) . . .

def read_albums(filename) : file = open("albums.txt", "r") bands = {} for line in file : words = line.split() for word in words: if word in '-' : del(words[words.index(word):]) string1 = "" for i in words : list1 = [] string1 = string1 + i + " " list1.append(string1) for k in list1 : if (k in bands) : bands[k] = bands[k] +1 else : bands[k] = 1 for word in bands : frequency = bands[word] print(word + ":", len(bands))

3条回答

网友

1楼 · 编辑于 2024-06-06 13:42:03

我的方法是使用split()方法将文件行分成组成标记的列表。然后，您可以获取乐队名称（列表中的第一个标记），并开始将这些名称添加到字典中以跟踪计数：

import operator

def main():
  f = open("albums.txt", "rU")
  band_counts = {}

  #build a dictionary that adds each band as it is listed, then increments the count for re-lists
  for line in f:
    line_items = line.split("-") #break up the line into individual tokens
    band = line_items[0]

  #don't want to add newlines to the band list
  if band == "\n":
    continue

  if band in band_counts:
    band_counts[band] += 1 #band already in the counts, increment the counts
  else:
    band_counts[band] = 1  #if the band was not already in counts, add it with a count of 1

  #create a list of sorted results
  sorted_list = sorted(band_counts.iteritems(), key=operator.itemgetter(1))

  for item in sorted_list:
    print item[0], ":", item[1]

注意事项：

我按照这个答案的建议创建排序结果：Sort a Python dictionary by value
如果您不熟悉Python，请查看Google的Python类。我刚开始的时候发现它很有帮助：https://developers.google.com/edu/python/?csw=1

网友

2楼 · 编辑于 2024-06-06 13:42:03

如果您想要简洁，请使用“defaultdict”和“sorted”

from collections import defaultdict
bands = defaultdict(int)
with open('tmp.txt') as f:
   for line in f.xreadlines():
       band = line.split(' - ')[0]
       bands[band] += 1
for band, count in sorted(bands.items(), key=lambda t: t[1], reverse=True):
    print '%s: %d' % (band, count)

网友

3楼 · 编辑于 2024-06-06 13:42:03

你说得对，有一个更简单的方法，使用^{}：

from collections import Counter

with open('bandfile.txt') as f:
   counts = Counter(line.split('-')[0].strip() for line in f if line)

for band, count in counts.most_common():
    print("{0}:{1}".format(band, count))

what exactly is this doing: line.split('-')[0].strip() for line in fif line?

这条线是以下循环的长形式：

^{pr2}$

但与上面的循环不同，它不会创建中间层。相反，它创建了一个generator expression-一种更节省内存的方法来逐步处理事情；它被用作Counter的参数。在

相关问题更多 >

编程相关推荐

热门问题

热门文章