如何创建分箱

0 投票
1 回答
1799 浏览
提问于 2025-04-18 14:28

目前我用以下代码创建了几个区间:

bin_volumes = [((i+1)**3 - i**3) * bin_width**3 *4 * np.pi/3 for i in range(num_bins)]

区间的格式是:

[(0.0, 0.10000000000000001), (0.10000000000000001, 0.20000000000000001), (0.20000000000000001, 0.30000000000000004), (0.30000000000000004, 0.40000000000000002), (0.40000000000000002, 0.5), (0.5, 0.60000000000000009), (0.60000000000000009, 0.70000000000000007), (0.70000000000000007, 0.80000000000000004), (0.80000000000000004, 0.90000000000000002), (0.90000000000000002, 1.0), (1.0, 1.1000000000000001), (1.1000000000000001, 1.2000000000000002), (1.2000000000000002, 1.3), (1.3, 1.4000000000000001), (1.4000000000000001, 1.5), (1.5, 1.6000000000000001), (1.6000000000000001, 1.7000000000000002), (1.7000000000000002, 1.8), (1.8, 1.9000000000000001), (1.9000000000000001, 2.0), (2.0, 2.1000000000000001), (2.1000000000000001, 2.2000000000000002), (2.2000000000000002, 2.3000000000000003), (2.3000000000000003, 2.4000000000000004), (2.4000000000000004, 2.5), (2.5, 2.6000000000000001), (2.6000000000000001, 2.7000000000000002), (2.7000000000000002, 2.8000000000000003), (2.8000000000000003, 2.9000000000000004), (2.9000000000000004, 3.0), (3.0, 3.1000000000000001), (3.1000000000000001, 3.2000000000000002), (3.2000000000000002, 3.3000000000000003), (3.3000000000000003, 3.4000000000000004), (3.4000000000000004, 3.5), (3.5, 3.6000000000000001), (3.6000000000000001, 3.7000000000000002), (3.7000000000000002, 3.8000000000000003), (3.8000000000000003, 3.9000000000000004)

数据的格式是:

3.615                                                                                                                                                                                                          
4.42745271008                                                                                                                                                                                                  
2.55619101399                                                                                                                                                                                                  
2.55619101399                                                                                                                                                                                                  
2.55619101399                                                                                                                                                                                                  
4.42745271008                                                                                                                                                                                                  
3.615
2.55619101399
4.42745271008
5.71581687075
5.71581687075
3.615
2.55619101399
2.55619101399
2.55619101399
2.55619101399
2.55619101399
2.55619101399

我想要每当有一个数据点落在某个区间内时,就能把这个区间的计数加一,这样我就可以统计每个区间的“频率”,然后用这些频率来绘制图表。

创建数据的代码:

for b in range(2047):
    for a in range(b+1,2048):
        vector1 = (l[b][0],l[b][1],l[b][2])
        vector2 = (l[a][0],l[a][1],l[a][2])

        x = vector1
        y = vector2
        vector3 = list(np.array(x) - np.array(y))

        dotProduct = reduce( operator.add, map( operator.mul, vector3, vector3))

        dp = dotProduct**.5
        data = dp

1 个回答

1

这段代码会生成一个元组的列表,每个元组都定义了一个区间的范围:

bins = [(i*bin_width, (i+1)*bin_width) for i in range(num_bins)]

如果你有一组数据值,比如这样:

data = [0.7, 2.8, 1.3]

那么你可以用下面的方式来统计每个区间里有多少数据值:

[sum([(value >= low) and (value < high) for value in data]) for low, high in bins]

撰写回答