使用平均值调整大小或重新分箱numpy二维数组

41 投票

5 回答

38506 浏览

数据工程师

提问于 2025-04-17 06:06

我正在尝试用Python重新实现一个IDL的函数：

http://star.pst.qub.ac.uk/idl/REBIN.html

这个函数的作用是通过取平均值来缩小一个二维数组，缩小的比例是一个整数。

举个例子：

>>> a=np.arange(24).reshape((4,6))
>>> a
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])

我想把它调整到(2,3)的大小，通过对相关的样本取平均，期望的输出应该是：

>>> b = rebin(a, (2, 3))
>>> b
array([[  3.5,   5.5,  7.5],
       [ 15.5, 17.5,  19.5]])

也就是说，b[0,0] = np.mean(a[:2,:2]), b[0,1] = np.mean(a[:2,2:4])，依此类推。

我觉得我应该把数组重新调整为四维，然后在正确的切片上取平均值，但我还没搞清楚具体的算法。你能给我一点提示吗？

数据处理二维数组数学算法数组切片平均值计算数据重塑数组调整重新分箱

5 个回答

这里有一种方法可以实现你想要的，通过矩阵乘法来完成，而不需要新数组的尺寸去整除旧数组。

首先，我们生成一个行压缩矩阵和一个列压缩矩阵（我相信还有更简洁的方法，可能仅用numpy的操作就能做到）：

def get_row_compressor(old_dimension, new_dimension):
    dim_compressor = np.zeros((new_dimension, old_dimension))
    bin_size = float(old_dimension) / new_dimension
    next_bin_break = bin_size
    which_row = 0
    which_column = 0
    while which_row < dim_compressor.shape[0] and which_column < dim_compressor.shape[1]:
        if round(next_bin_break - which_column, 10) >= 1:
            dim_compressor[which_row, which_column] = 1
            which_column += 1
        elif next_bin_break == which_column:

            which_row += 1
            next_bin_break += bin_size
        else:
            partial_credit = next_bin_break - which_column
            dim_compressor[which_row, which_column] = partial_credit
            which_row += 1
            dim_compressor[which_row, which_column] = 1 - partial_credit
            which_column += 1
            next_bin_break += bin_size
    dim_compressor /= bin_size
    return dim_compressor


def get_column_compressor(old_dimension, new_dimension):
    return get_row_compressor(old_dimension, new_dimension).transpose()

... 比如说，get_row_compressor(5, 3) 会给你：

[[ 0.6  0.4  0.   0.   0. ]
 [ 0.   0.2  0.6  0.2  0. ]
 [ 0.   0.   0.   0.4  0.6]]

而 get_column_compressor(3, 2) 会给你：

[[ 0.66666667  0.        ]
 [ 0.33333333  0.33333333]
 [ 0.          0.66666667]]

然后，只需先用行压缩矩阵乘以原矩阵，再用列压缩矩阵乘以结果，就能得到压缩后的矩阵：

def compress_and_average(array, new_shape):
    # Note: new shape should be smaller in both dimensions than old shape
    return np.mat(get_row_compressor(array.shape[0], new_shape[0])) * \
           np.mat(array) * \
           np.mat(get_column_compressor(array.shape[1], new_shape[1]))

使用这个技术，

compress_and_average(np.array([[50, 7, 2, 0, 1],
                               [0, 0, 2, 8, 4],
                               [4, 1, 1, 0, 0]]), (2, 3))

得到的结果是：

[[ 21.86666667   2.66666667   2.26666667]
 [  1.86666667   1.46666667   1.86666667]]

回答于 2025-04-17 由 Python大师

分享举报

J.F. Sebastian 对于二维分箱有一个很棒的回答。这里有一个他“重分箱”函数的版本，可以用于任意维度：

def bin_ndarray(ndarray, new_shape, operation='sum'):
    """
    Bins an ndarray in all axes based on the target shape, by summing or
        averaging.

    Number of output dimensions must match number of input dimensions and 
        new axes must divide old ones.

    Example
    -------
    >>> m = np.arange(0,100,1).reshape((10,10))
    >>> n = bin_ndarray(m, new_shape=(5,5), operation='sum')
    >>> print(n)

    [[ 22  30  38  46  54]
     [102 110 118 126 134]
     [182 190 198 206 214]
     [262 270 278 286 294]
     [342 350 358 366 374]]

    """
    operation = operation.lower()
    if not operation in ['sum', 'mean']:
        raise ValueError("Operation not supported.")
    if ndarray.ndim != len(new_shape):
        raise ValueError("Shape mismatch: {} -> {}".format(ndarray.shape,
                                                           new_shape))
    compression_pairs = [(d, c//d) for d,c in zip(new_shape,
                                                  ndarray.shape)]
    flattened = [l for p in compression_pairs for l in p]
    ndarray = ndarray.reshape(flattened)
    for i in range(len(new_shape)):
        op = getattr(ndarray, operation)
        ndarray = op(-1*(i+1))
    return ndarray

回答于 2025-04-17 由 Python大师

分享举报

这里有一个例子，基于你链接的回答（为了更清楚）：

>>> import numpy as np
>>> a = np.arange(24).reshape((4,6))
>>> a
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])
>>> a.reshape((2,a.shape[0]//2,3,-1)).mean(axis=3).mean(1)
array([[  3.5,   5.5,   7.5],
       [ 15.5,  17.5,  19.5]])

作为一个函数：

def rebin(a, shape):
    sh = shape[0],a.shape[0]//shape[0],shape[1],a.shape[1]//shape[1]
    return a.reshape(sh).mean(-1).mean(1)

回答于 2025-04-17 由 Python大师

分享举报

使用平均值调整大小或重新分箱numpy二维数组

5 个回答

撰写回答