Python中的循环和总数

0 投票

4 回答

621 浏览

提问于 2025-04-17 05:11

我正在尝试写一个程序，用来绘制一个直方图，显示一个列表中单词的长度。目前，我已经做到了一点：每当循环找到一个特定长度的单词时，就把那个长度的单词数量加一。现在我的代码是这样的：

L = []
for i in range(L):
length = len(i)
for len(i) = 1:
    total1 = total1 + 1
for len(i) = 2:
    total2= total2 + 1
for len(i) = 3:
    total3 = total3 + 1
for len(i) = 4:
    total4 = total4 + 1
for len(i) = 5:
    total5 = total5 + 1

不过，这显然是个笨办法，因为我需要给每个长度的总数起不同的名字，比如说总数可能会有11个不同的名字。所以我想问，我能不能简单地用

L = []
for i in range(L):
length = len(i)
for len(i) = n:
    totaln = totaln + 1

来涵盖所有长度的情况，然后在后面像引用total4那样使用它？还是说解释器会报错，因为total4没有明确地定义过？

至于代码的其他部分，我觉得我能搞定，只是这个问题让我有点困惑，因为我对编程还很陌生。

代码优化直方图变量命名数据统计编程基础循环列表处理

4 个回答

这是一个很好的例子，展示了自然语言工具包（NLTK）的用法。这个工具包可能对你想做的事情来说有点强大，但它也可能是一个捷径。没必要为了生成图表而重新发明轮子。

比如，假设我有以下文本（temp.txt）

如果一只土拨鼠能扔木头，它能扔多少木头？它会尽可能多地扔，扔的木头和土拨鼠能扔的木头一样多，如果一只土拨鼠能扔木头的话。

下面是生成并绘制简单频率分布图的代码

from nltk.tokenize import word_tokenize as tokenize
from nltk.probability import FreqDist
from nltk.text import Text
def freqDist(infile):
  '''tokenize and return a simple fd'''
  fn = open('/home/matt/temp','r') 
  tokens = tokenize(fn.read())
  fn.close()
  t = Text(tokens)
  fd = FreqDist(t)
  return fd

假设你不使用这个函数，看看下面的代码会给我们带来什么

>>> tokens
['How', 'much', 'wood', 'would', 'a', 'woodchuck', 'chuck', 'If', 'a', 'woodchuck', 'could', 'chuck', 'wood', '?', 'He', 'would', 'chuck', ',', 'he', 'would', ',', 'as', 'much', 'as', 'he', 'could', ',', 'And', 'chuck', 'as', 'much', 'wood', 'as', 'a', 'woodchuck', 'would', 'If', 'a', 'woodchuck', 'could', 'chuck', 'wood', '.']
>>> t
<Text: How much wood would a woodchuck chuck If...>
>>> fd
<FreqDist with 43 outcomes>
>>> fd[wood]
4

最后

>>#Freq of the top five words
>>fd.plot(10)

前五个单词的频率

但是，你想要的是词的长度！

我们可以通过简单修改代码来生成这个

from nltk.tokenize import word_tokenize as tokenize
from nltk.probability import FreqDist
from nltk.text import Text

def fDist(infile):
    '''tokenize and return a simple fd'''
    fn = open(infile,'r') 
    tokens = tokenize(fn.read())
    token_lengths = [len(token) for token in tokens]
    #if you do not want to include only words (not punctuation)
    #token_lengths = [len(token) for token in tokens if token.isalpha()]
    fn.close()
    t = Text(token_lengths)
    fd = FreqDist(t)
    return fd

所以...

>>fd=Fdist('/home/user/temp.txt')
>>fd.plot()

词长度的频率分布

让我们把这些整合在一起，做一些有用且可重复使用的东西

from nltk.tokenize import word_tokenize as tokenize
from nltk.probability import FreqDist
from nltk.text import Text
import sys

def fDist(infile):
    '''tokenize and return a simple fd'''
    fn = open(infile,'r') 
    tokens = tokenize(fn.read())
    token_lengths = [len(token) for token in tokens]
    #if you do not want to include only words (not punctuation)
    #token_lengths = [len(token) for token in tokens if token.isalpha()]
    fn.close()
    t = Text(token_lengths)
    fd = FreqDist(t)
    return fd

def main():
   fd = fDist(sys.argv[1])
   fd.plot()

if __name__ == '__main__':
  main()

现在你可以通过命令行调用上面的内容，方法如下：

  ./fdist.py infile.txt

你可能还可以添加一些检查，以确保你有一个有效的文件，但这超出了这个问题的范围。

回答于 2025-04-17 由 Python大师

分享举报

这不是直接回答你问题的内容，但从长远来看会更有帮助。

你的Python基础几乎为零。我看过你其他的问题，感觉你是在还没学会走路的时候就想跑。建议你从非程序员的Python入门指南中选择一个教程，认真做完所有的例子。如果有哪个具体的概念不明白，可以在这里提问。

在完成这些教程后，我觉得如何像计算机科学家一样思考 - 第二版会非常适合你。书的每一章后面都有很棒的练习。

回答于 2025-04-17 由 Python大师

分享举报

在Python 2.7或更高版本中，你可以使用一个叫做Counter的工具：

from collections import Counter
a = ["basically", "in", "Python", "I", "am", "trying", "to", "write",
     "a", "program", "to", "draw", "a", "histogram", "of", "the",
     "lengths", "of", "words", "present", "in", "a", "list"]
print Counter(map(len, a))

这个工具会输出

Counter({2: 7, 1: 4, 7: 3, 4: 2, 5: 2, 6: 2, 9: 2, 3: 1})

它的结果是一个字典，字典里把单词的长度和出现的频率对应起来。

回答于 2025-04-17 由 Python大师

分享举报

Python中的循环和总数

4 个回答

但是，你想要的是词的长度！

撰写回答