Python,概率
我的代码如下:
with open("test.txt") as f_in:
for line in f_in:
for char in line:
frequencies[char] += 1
list= [(count, char) for char, count in frequencies.iteritems()]
这段代码打开了test.txt文件,读取每一行,并把每个字符的数量列出来,格式像这样:[(3, 'a'),.........]。这表示在整个文本文件中,有三个'a',依此类推……
我需要做的是计算这个数字,不是3,而是[ 3 / 所有字符的总数 ]。所以我不需要知道'a'在文本中出现了多少次,而是需要计算'a'的概率。
举个例子,如果在文本(test.txt)中有"aaab",我想要的输出结果是一个列表:[(0.75, 'a'), (0.25, 'b')]
非常感谢你的帮助。
编辑2
import collections
frequencies = collections.defaultdict(int)
with open("test.txt") as f_in:
for line in f_in:
for char in line:
frequencies[char] += 1
total = float(sum(frequencies.keys()))
verj= [(count/total, char) for char, count in frequencies.iteritems()]
这个代码不工作,给我报错:
total = float(sum(frequencies.keys()))
TypeError: unsupported operand type(s) for +: 'int' and 'str'
3 个回答
0
快速简单:
counter = 0
with open("test.txt") as f_in:
for line in f_in:
for char in line:
frequencies[char] += 1
counter += 1
list= [(count / counter, char) for char, count in frequencies.iteritems()]
1
你快到了。
with open("test.txt") as f_in:
for line in f_in:
for char in line:
frequencies[char] += 1
total = float(sum(frequencies.values()))
symbols = [(count/total, char) for char, count in frequencies.iteritems()]
注意,我把你得到的列表改了个名字,因为 list
是一个内置的名称,不应该用来给变量或函数命名。
1
如果 frequencies = {"a": 3, "b": 4}
,那么 frequencies.values()
会给我们 [3, 4]
,我们可以计算它们的总和:
total = float(sum(frequencies.values()))
然后我们可以计算概率:
probs = [(count / total, char) for char, count in frequencies.iteritems()]
注意,Python在用两个整数相除时会返回一个整数,这就是我先把总和转换成浮点数的原因:
Python 2.7 (r27:82508, Jul 3 2010, 21:12:11) [GCC 4.0.1 (Apple Inc. build 5493)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> 3 / 4 0 >>> 3 / 4.0 0.75