如何获取直方图的数据信息

Question

我想要获取直方图某个区间里包含的数据列表。我正在使用numpy和Matplotlib。我知道怎么遍历数据并检查区间的边界。不过，我想做的是二维直方图，而实现这个的代码看起来有点复杂。请问numpy有没有什么方法可以让这个过程更简单一些？

对于一维的情况，我可以使用searchsorted()。但是逻辑上并没有好多少，我也不想在每个数据点上都进行二分查找，特别是当我不需要这样做的时候。

大部分复杂的逻辑都是因为区间的边界。所有的区间都有这样的边界：[左边界, 右边界)。除了最后一个区间，它的边界是这样的：[左边界, 右边界]。

这里是一些一维情况的示例代码：

import numpy as np

data = [0, 0.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5, 3]

hist, edges = np.histogram(data, bins=3)

print 'data =', data
print 'histogram =', hist
print 'edges =', edges

getbin = 2  #0, 1, or 2

print '---'
print 'alg 1:'

#for i in range(len(data)):
for d in data:
    if d >= edges[getbin]:
        if (getbin == len(edges)-2) or d < edges[getbin+1]:
            print 'found:', d
        #end if
    #end if
#end for

print '---'
print 'alg 2:'

for d in data:
    val = np.searchsorted(edges, d, side='right')-1
    if val == getbin or val == len(edges)-1:
        print 'found:', d
    #end if
#end for

这里是一些二维情况的示例代码：

import numpy as np

xdata = [0, 1.5, 1.5, 2.5, 2.5, 2.5, \
         0.5, 0.5, 0.5, 0.5, 1.5, 1.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, \
         0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 3]
ydata = [0, 5,5, 5, 5, 5, \
         15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, \
         25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 30]

xbins = 3
ybins = 3
hist2d, xedges, yedges = np.histogram2d(xdata, ydata, bins=(xbins, ybins))

print 'data2d =', zip(xdata, ydata)
print 'hist2d ='
print hist2d
print 'xedges =', xedges
print 'yedges =', yedges

getbin2d = 5  #0 through 8

print 'find data in bin #', getbin2d

xedge_i = getbin2d % xbins
yedge_i = int(getbin2d / xbins) #IMPORTANT: this is xbins

for x, y in zip(xdata, ydata):
    # x and y left edges
    if x >= xedges[xedge_i] and y >= yedges[yedge_i]:
        #x right edge
        if xedge_i == xbins-1 or x < xedges[xedge_i + 1]:
            #y right edge
            if yedge_i == ybins-1 or y < yedges[yedge_i + 1]:
                print 'found:', x, y
            #end if
        #end if
    #end if
#end for

有没有更简洁或更高效的方法来实现这个？看起来numpy应该有相关的功能。

数据处理 numpy 直方图数据分析 matplotlib 二维直方图区间边界 searchsorted

如何获取直方图的数据信息

2 个回答

撰写回答