Python,概率

2 投票
3 回答
1626 浏览
提问于 2025-04-16 08:25

我的代码如下:

with open("test.txt") as f_in:
    for line in f_in:
        for char in line:
            frequencies[char] += 1

list= [(count, char) for char, count in frequencies.iteritems()]

这段代码打开了test.txt文件,读取每一行,并把每个字符的数量列出来,格式像这样:[(3, 'a'),.........]。这表示在整个文本文件中,有三个'a',依此类推……

我需要做的是计算这个数字,不是3,而是[ 3 / 所有字符的总数 ]。所以我不需要知道'a'在文本中出现了多少次,而是需要计算'a'的概率。

举个例子,如果在文本(test.txt)中有"aaab",我想要的输出结果是一个列表:[(0.75, 'a'), (0.25, 'b')]

非常感谢你的帮助。


编辑2

import collections
frequencies = collections.defaultdict(int)



with open("test.txt") as f_in:
    for line in f_in:
        for char in line:
            frequencies[char] += 1
total = float(sum(frequencies.keys()))

verj= [(count/total, char) for char, count in frequencies.iteritems()]

这个代码不工作,给我报错:

total = float(sum(frequencies.keys()))

TypeError: unsupported operand type(s) for +: 'int' and 'str'

3 个回答

0

快速简单:

   counter = 0
   with open("test.txt") as f_in:
        for line in f_in:
            for char in line:
                frequencies[char] += 1
                counter += 1

    list= [(count / counter, char) for char, count in frequencies.iteritems()]
1

你快到了。

with open("test.txt") as f_in:
    for line in f_in:
        for char in line:
            frequencies[char] += 1
total = float(sum(frequencies.values()))
symbols = [(count/total, char) for char, count in frequencies.iteritems()]

注意,我把你得到的列表改了个名字,因为 list 是一个内置的名称,不应该用来给变量或函数命名。

1

如果 frequencies = {"a": 3, "b": 4},那么 frequencies.values() 会给我们 [3, 4],我们可以计算它们的总和:

total = float(sum(frequencies.values()))

然后我们可以计算概率:

probs = [(count / total, char) for char, count in frequencies.iteritems()]

注意,Python在用两个整数相除时会返回一个整数,这就是我先把总和转换成浮点数的原因:

Python 2.7 (r27:82508, Jul  3 2010, 21:12:11) 
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 3 / 4
0
>>> 3 / 4.0
0.75

撰写回答