如何在Python中将一系列浮点值分箱为直方图?
我有一组浮点数值(总是小于0),我想把它们放进直方图里,也就是说,直方图中的每个条形代表一个值的范围,比如[0,0.150)。
我手上的数据看起来是这样的:
0.000
0.005
0.124
0.000
0.004
0.000
0.111
0.112
根据我下面的代码,我希望得到的结果是这样的:
[0, 0.005) 5
[0.005, 0.011) 0
...etc..
我试着用我的代码来进行这样的分组,但似乎不太管用。正确的做法是什么呢?
#! /usr/bin/env python
import fileinput, math
log2 = math.log(2)
def getBin(x):
return int(math.log(x+1)/log2)
diffCounts = [0] * 5
for line in fileinput.input():
words = line.split()
diff = float(words[0]) * 1000;
diffCounts[ str(getBin(diff)) ] += 1
maxdiff = [i for i, c in enumerate(diffCounts) if c > 0][-1]
print maxdiff
maxBin = max(maxdiff)
for i in range(maxBin+1):
lo = 2**i - 1
hi = 2**(i+1) - 1
binStr = '[' + str(lo) + ',' + str(hi) + ')'
print binStr + '\t' + '\t'.join(map(str, (diffCounts[i])))
~
3 个回答
3
第一个错误是:
Traceback (most recent call last):
File "C:\foo\foo.py", line 17, in <module>
diffCounts[ str(getBin(diff)) ] += 1
TypeError: list indices must be integers
你为什么要把一个整数转换成字符串呢?明明需要的是字符串。先把这个问题解决了,然后我们会看到:
Traceback (most recent call last):
File "C:\foo\foo.py", line 17, in <module>
diffCounts[ getBin(diff) ] += 1
IndexError: list index out of range
因为你只创建了5个桶。我不太明白你这个分桶的方案,不过我们先把桶的数量改成50个,看看会发生什么:
6
Traceback (most recent call last):
File "C:\foo\foo.py", line 21, in <module>
maxBin = max(maxdiff)
TypeError: 'int' object is not iterable
maxdiff
是你整数列表中的一个单一值,那这里的max
是干什么的呢?把它去掉,现在我们得到:
6
Traceback (most recent call last):
File "C:\foo\foo.py", line 28, in <module>
print binStr + '\t' + '\t'.join(map(str, (diffCounts[i])))
TypeError: argument 2 to map() must support iteration
果然,你把一个单一值当作map
的第二个参数使用。我们来简化一下最后两行,从这个:
binStr = '[' + str(lo) + ',' + str(hi) + ')'
print binStr + '\t' + '\t'.join(map(str, (diffCounts[i])))
变成这个:
print "[%f, %f)\t%r" % (lo, hi, diffCounts[i])
现在它打印出来的是:
6
[0.000000, 1.000000) 3
[1.000000, 3.000000) 0
[3.000000, 7.000000) 2
[7.000000, 15.000000) 0
[15.000000, 31.000000) 0
[31.000000, 63.000000) 0
[63.000000, 127.000000) 3
我不太确定接下来该怎么做,因为我并不太理解你希望使用的分桶方式。它似乎涉及到二进制的幂,但对我来说并不太清楚……
4
from pylab import *
data = []
inf = open('pulse_data.txt')
for line in inf:
data.append(float(line))
inf.close()
#binning
B = 50
minv = min(data)
maxv = max(data)
bincounts = []
for i in range(B+1):
bincounts.append(0)
for d in data:
b = int((d - minv) / (maxv - minv) * B)
bincounts[b] += 1
# plot histogram
plot(bincounts,'o')
show()
当然可以!请把你想要翻译的内容发给我,我会帮你用简单易懂的语言解释清楚。
18
尽量不要重复造轮子。NumPy已经包含了你所需要的一切:
#!/usr/bin/env python
import numpy as np
a = np.fromfile(open('file', 'r'), sep='\n')
# [ 0. 0.005 0.124 0. 0.004 0. 0.111 0.112]
# You can set arbitrary bin edges:
bins = [0, 0.150]
hist, bin_edges = np.histogram(a, bins=bins)
# hist: [8]
# bin_edges: [ 0. 0.15]
# Or, if bin is an integer, you can set the number of bins:
bins = 4
hist, bin_edges = np.histogram(a, bins=bins)
# hist: [5 0 0 3]
# bin_edges: [ 0. 0.031 0.062 0.093 0.124]