使用numpy对2d数组进行“平铺”

Question

我正在尝试通过将一个二维数组分成多个方块来减小它的大小，然后把这些方块写入另一个数组。每个方块的大小是可变的，比如说边长为n。这个数组的数据类型是整数。目前我在用Python的循环把每个方块分配给一个临时数组，然后从这个临时数组中提取出唯一的值。接着我会遍历这些唯一值，找出出现次数最多的那个。你可以想象，随着输入数组的增大，这个过程会变得非常慢。

我见过一些例子是从这些方块中取最小值、最大值和平均值，但我不知道怎么把它们转换成多数值。对二维numpy数组进行平均分组和用平均值调整大小或重新分组一个numpy二维数组

我希望能找到一些方法来加快这个过程，使用numpy来对整个数组进行处理。（当输入数据太大而无法放入内存时，我可以处理这个方面，切换到数组的分块部分）

谢谢

#snippet of my code
#pull a tmpArray representing one square chunk of my input array
kernel = sourceDs.GetRasterBand(1).ReadAsArray(int(sourceRow), 
                                    int(sourceCol), 
                                    int(numSourcePerTarget),
                                    int(numSourcePerTarget))
#get a list of the unique values
uniques = np.unique(kernel)
curMajority = -3.40282346639e+038
for val in uniques:
    numOccurances = (array(kernel)==val).sum()
    if numOccurances > curMajority:
        ans = val
        curMajority = numOccurances

#write out our answer
outBand.WriteArray(curMajority, row, col)

#This is insanity!!!

在Bago的优秀建议下，我觉得我已经接近解决方案了。到目前为止，我做了一些改动，使用了一个（xy, nn）数组，来自原始网格的形状。现在我遇到的问题是，我似乎无法找到如何将一维的where、counts和uniq_a步骤转换为二维的方式。

#test data
grid = np.array([[ 37,  1,  4,  4, 6,  6,  7,  7],
                 [ 1,  37,  4,  5, 6,  7,  7,  8],
                 [ 9,  9, 11, 11, 13,  13,  15,  15],
                 [9, 10, 11, 12, 13,  14,  15,  16],
                 [ 17, 17,  19,  19, 21,  11,  23,  23],
                 [ 17, 18,  19,  20, 11,  22,  23,  24],
                 [ 25, 25, 27, 27, 29,  29,  31,  32],
                 [25, 26, 27, 28, 29,  30,  31,  32]])
print grid

n = 4
X, Y = grid.shape
x = X // n
y = Y // n
grid = grid.reshape( (x, n, y, n) )
grid = grid.transpose( [0, 2, 1, 3] )
grid = grid.reshape( (x*y, n*n) )
grid = np.sort(grid)
diff = np.empty((grid.shape[0], grid.shape[1]+1), bool)
diff[:, 0] = True
diff[:, -1] = True
diff[:, 1:-1] = grid[:, 1:] != grid[:, :-1]
where = np.where(diff)

#This is where if falls apart for me as 
#where returns two arrays:
# row indices [0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3]
# col indices [ 0  2  5  6  9 10 13 14 16  0  3  7  8 11 12 15 16  0  3  4  7  8 11 12 15
# 16  0  2  3  4  7  8 11 12 14 16]
#I'm not sure how to get a 
counts = where[:, 1:] - where[:, -1]
argmax = counts[:].argmax()
uniq_a = grid[diff[1:]]
print uniq_a[argmax]

numpy performance optimization array manipulation data processing unique values statistical analysis 2d array block processing

使用numpy对2d数组进行“平铺”

2 个回答

撰写回答