Python：按字母顺序打印所有术语的计数，包括零

0 投票

3 回答

1660 浏览

提问于 2025-04-17 16:13

我正在处理360多个文本文件，目的是统计每个文件中某些单词出现的次数。下面是我的代码：

>>> cnt=Counter()
>>> def process(filename):
words=re.findall('\w+',open(filename).read().lower())
for word in words:
    if word in words_fra:
        cnt[word]+=1
    if word in words_1:
        cnt[word]+=1
print cnt
    cnt.clear()

>>> for filename in os.listdir("C:\Users\Cameron\Desktop\Project"):
process(filename)

我有两个列表，分别叫做words_fra和words_1，每个列表里大约有10到15个单词。现在的代码可以输出匹配的单词和它们的出现次数，但它不会显示出现次数为零的单词，而且输出的单词是按出现频率排序的。

输出的例子是这样的：

Counter({'prices': 140, 'inflation': 107, 'labor': 46, 'price': 34, 'wage': 27,     'productivity': 26, 'capital': 21, 'workers': 20, 'wages': 19, 'employment': 18, 'investment': 14, 'unemployment': 13, 'construction': 13, 'production': 11, 'inflationary': 10, 'housing': 8, 'credit': 8, 'job': 7, 'industry': 7, 'jobs': 6, 'worker': 4, 'tax': 2, 'income': 2, 'aggregates': 1, 'payments': 1})
Counter({'inflation': 193, 'prices': 118, 'price': 97, 'labor': 58, 'unemployment': 42, 'wage': 32, 'productivity': 32, 'construction': 22, 'employment': 18, 'wages': 17, 'industry': 17, 'investment': 16, 'income': 16, 'housing': 15, 'production': 13, 'job': 13, 'inflationary': 12, 'workers': 9, 'aggregates': 9, 'capital': 5, 'jobs': 5, 'tax': 4, 'credit': 3, 'worker': 2})

我对现在的格式还算满意，但我希望能显示所有单词的出现次数，即使是零次的单词也要显示出来，并且我希望单词的出现次数是按字母顺序排列，而不是按频率。

我该在我的代码中加上什么才能实现这个呢？另外，如果能把结果整理成一个漂亮的CSV格式，那就更好了，单词作为列标题，出现次数作为行值。

谢谢！

编辑：上面是当前输出的样子，下面是我希望它们呈现的样子。

Wordlist="a b c d"
Counter({'c': 4, 'a': 3, 'b':1})
Counter({'a': 3, 'b': 1, 'c': 4, 'd': 0})

文本处理文件处理数据输出文本分析 csv格式字母排序词频统计词汇计数

3 个回答

在编程中，有时候我们需要让程序在特定的条件下执行某些操作。比如说，当用户点击一个按钮时，我们希望程序能够做出反应。这种情况下，我们就会用到“事件”这个概念。

简单来说，事件就是程序中发生的某种事情，比如用户的点击、键盘的输入或者其他任何动作。当这些事情发生时，程序可以通过“事件处理程序”来响应这些事件。

举个例子，想象一下你在玩一个游戏，按下了“开始”按钮。这个按钮的点击就是一个事件，而游戏开始的动作就是对这个事件的响应。程序会监控这些事件，并在它们发生时执行相应的代码。

总之，事件和事件处理程序就像是程序和用户之间的桥梁，让程序能够根据用户的操作做出反应。

for word in sorted(words_fra + words_1):
    print word, cnt[word]

回答于 2025-04-17 由 Python大师

分享举报

如果你想让结果变成 Counter 类型，那你需要重写 Counter 的 __add__ 方法，让它能够接受 0。举个例子……

In [8]: from collections import  Counter

In [9]: Counter({'red': 4, 'blue': 2,'white':0})+Counter({'red': 4, 'blue': 2,'white':0})
Out[9]: Counter({'red': 8, 'blue': 4})

In [10]: 
    ...: class Counter(Counter):
    ...:     def __add__(self, other):
    ...:         if not isinstance(other, Counter):
    ...:             return NotImplemented
    ...:         result = Counter()
    ...:         for elem, count in self.items():
    ...:             newcount = count + other[elem]
    ...:             result[elem] = newcount
    ...:         for elem, count in other.items():
    ...:             if elem not in self:
    ...:                 result[elem] = count
    ...:         return result
    ...:     

In [11]: Counter({'red': 4, 'blue': 2,'white':0})+Counter({'red': 4, 'blue': 2,'white':0})
Out[11]: Counter({'red': 8, 'blue': 4, 'white': 0}) #<-- now you see that `0` has been added to the resultant Counter

回答于 2025-04-17 由 Python大师

分享举报

要打印出你单词列表里的所有单词，你可以在开始查找文件中的单词之前，先遍历一下这个单词列表，然后把每个单词添加到结果字典里，初始的计数设置为0。

如果想要按正确的顺序打印这些单词，可以使用内置的 sorted() 函数。

大概是这样的：

import re

wordlist = words_fra + words_1
cnt = {}
for word in wordlist:
    cnt[word] = 0

words=re.findall('\w+',open('foo.html').read().lower())
for word in words:
    if word in wordlist:
        cnt[word]+=1

for result in sorted(cnt.items()):
    print("{0} appeared {1} times".format(*result))

如果你想要把最常用的单词排在前面，可以这样做：

for result in sorted(cnt.items(), key=lambda x:x[1]):
     print("{0} appeared {1} times".format(*result))

回答于 2025-04-17 由 Python大师

分享举报

Python：按字母顺序打印所有术语的计数，包括零

3 个回答

撰写回答