加速遍历Numpy数组

Question

我正在用Numpy进行图像处理，具体来说是进行一个运行标准差拉伸。这个过程会读取一定数量的列，计算标准差，然后进行百分比线性拉伸。接着，它会继续处理下一组列，重复相同的操作。输入的图像是一个1GB、32位、单波段的栅格图像，处理起来非常耗时（几个小时）。下面是代码。

我意识到我有3个嵌套的for循环，这可能是导致处理速度慢的原因。如果我把图像分成“块”来处理，比如加载一个[500,500]的数组，处理时间会短很多。不幸的是，由于相机的错误，我必须以非常长的条带（52,000 x 4）（y,x）来处理，以避免出现条纹。

如果有任何加速处理的建议，我将非常感激：

def box(dataset, outdataset, sampleSize, n):

    quiet = 0
    sample = sampleSize
    #iterate over all of the bands
    for j in xrange(1, dataset.RasterCount + 1): #1 based counter

        band = dataset.GetRasterBand(j)
        NDV = band.GetNoDataValue()

        print "Processing band: " + str(j)       

        #define the interval at which blocks are created
        intervalY = int(band.YSize/1)    
        intervalX = int(band.XSize/2000) #to be changed to sampleSize when working

        #iterate through the rows
        scanBlockCounter = 0

        for i in xrange(0,band.YSize,intervalY):

            #If the next i is going to fail due to the edge of the image/array
            if i + (intervalY*2) < band.YSize:
                numberRows = intervalY
            else:
                numberRows = band.YSize - i

            for h in xrange(0,band.XSize, intervalX):

                if h + (intervalX*2) < band.XSize:
                    numberColumns = intervalX
                else:
                    numberColumns = band.XSize - h

                scanBlock = band.ReadAsArray(h,i,numberColumns, numberRows).astype(numpy.float)

                standardDeviation = numpy.std(scanBlock)
                mean = numpy.mean(scanBlock)

                newMin = mean - (standardDeviation * n)
                newMax = mean + (standardDeviation * n)

                outputBlock = ((scanBlock - newMin)/(newMax-newMin))*255
                outRaster = outdataset.GetRasterBand(j).WriteArray(outputBlock,h,i)#array, xOffset, yOffset


                scanBlockCounter = scanBlockCounter + 1
                #print str(scanBlockCounter) + ": " + str(scanBlock.shape) + str(h)+ ", " + str(intervalX)
                if numberColumns == band.XSize - h:
                    break

                #update progress line
                if not quiet:
                    gdal.TermProgress_nocb( (float(h+1) / band.YSize) )

这是一个更新：在没有使用profile模块的情况下，因为我不想把小段代码封装成函数，所以我用了打印和退出语句来大致了解哪些行耗时最多。幸运的是（我知道我真的很幸运），有一行代码拖慢了整个过程。

    outRaster = outdataset.GetRasterBand(j).WriteArray(outputBlock,h,i)#array, xOffset, yOffset

看起来GDAL在打开输出文件和写出数组时效率很低。考虑到这一点，我决定把我修改后的数组“outBlock”添加到一个Python列表中，然后分块写出。这里是我修改的部分：

outputBlock只是被修改了...

         #Add the array to a list (tuple)
            outputArrayList.append(outputBlock)

            #Check the interval counter and if it is "time" write out the array
            if len(outputArrayList) >= (intervalX * writeSize) or finisher == 1:

                #Convert the tuple to a numpy array.  Here we horizontally stack the tuple of arrays.
                stacked = numpy.hstack(outputArrayList)

                #Write out the array
                outRaster = outdataset.GetRasterBand(j).WriteArray(stacked,xOffset,i)#array, xOffset, yOffset
                xOffset = xOffset + (intervalX*(intervalX * writeSize))

                #Cleanup to conserve memory
                outputArrayList = list()
                stacked = None
                finisher=0

Finisher只是一个处理边缘的标志。花了一些时间才搞清楚如何从列表中构建数组。在这个过程中，使用numpy.array会创建一个3维数组（有人能解释一下为什么吗？），而写入数组需要的是2维数组。现在总的处理时间从不到2分钟到5分钟不等。有人知道为什么会有这样的时间差吗？

非常感谢所有发帖的人！下一步是深入学习Numpy，了解向量化以进一步优化。

性能优化 numpy 图像处理标准差向量化处理时间 gdal 数据块处理

加速遍历Numpy数组

3 个回答

撰写回答