有没有与numpy中“列堆叠”相同的函数？

Question

我在用Python 2.7，系统是Windows Vista，32位。

我有一段代码，它读取辐射值、纬度和经度，还有一个扩展名为hdf的图像文件。然后我想进行近似最近邻的计算并进行映射。但是当它尝试进行近似最近邻时，出现了内存错误。

这个hdf文件本身有4.70 MB，感觉大小并不算太大。

这是我的代码：

if __name__=="__main__":

    filename = ... ( the hdf file I have)      
    cumData, z = readAIRS_L1_VIS(filename)

    x, y = get_lat_lon(filename)  

    x0, xn = int(x.min()+1), int(x.max())
    y0, yn = int(y.min()+1), int(y.max())

    ncol = xn - x0 + 1
    nrow = yn - y0 + 1

    X, Y = np.meshgrid(np.arange(x0, xn+1), np.arange(y0, yn+1))
    img = interp_knn(np.column_stack((x.ravel(), y.ravel())),
            z.ravel(), np.column_stack((X.ravel(), Y.ravel())))
    img.shape = (nrow, ncol)

然后我的函数和导入的库是：

from pyhdf.SD import SD
import scipy as sc
import numpy as np
import pylab, os
import pyproj as proj
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
import scikits.ann as ann

def readAIRS_L1_VIS(filename,variable=None):
    allz=[]
    """
    function
        read hdf file for AIR Level 1B VIS
    input : AIRS HDF file
    input : variables parameter (optional, default = radiances)

    returns dictionary with data and meta
    """

    if not os.path.exists(filename):
        raise "Invalid Filepath"
    reader = SD(filename)
    aVariables = reader.datasets().keys()
    if variable==None:
        variable = 'radiances'
    elif variable in aVariables:
        pass
    else:
        raise "Invalid Variable Specified"

    data = reader.select(variable).get()
    #data = np.array(data)
    allz.append(data)
    outDict = {'Variable':variable,'filename':filename.split('/')[-1],'data':data}
    return outDict,np.vstack(allz)

这是定义的get_lat_lon函数：

def get_lat_lon(path):
    allx = []
    ally = []
    reader = SD(path)
    lat = reader.select('Latitude').get()
    lon = reader.select('Longitude').get()    
    x,y = Proj(lon,lat)
    x /= 1000.0
    y /= 1000.0

    allx.append(x)
    ally.append(y)
    return np.vstack(allx),np.vstack(ally)

这是定义的interp_knn函数（就是近似最近邻的算法）

def interp_knn(data, z, p):
    print "building kdtree"
    k = ann.kdtree(data)
    print "kdtree lookup..."
    ind, dist = k.knn(p, 1)
    print "done"
    img = z[ind[:,0]]
    img[dist[:,0] > 15] = N.NaN
    return img

然后出现的错误是：

Traceback (most recent call last):
File "....\read_HDF5.py", line 166, in <module>
z.ravel(), np.column_stack((X.ravel(), Y.ravel())))
File "C:\Python27\lib\site-packages\numpy\lib\shape_base.py", line 296, in column_stack
return _nx.concatenate(arrays,1)
MemoryError

所以是列堆叠导致我出现这个错误吗？如果是这个问题，我该怎么解决呢？请给我一些建议。

编辑：

我输入了这些代码来打印出每个值：

print "x:",x
print "x.shape:",x.shape
print "y:",y
print "y.shape:",y.shape
print "X:",X
print "X.shape",X.shape
print "Y:",Y
print "Y.shape",Y.shape
print "x0:",x0    
print "xn:",xn    
print "y0:",y0    
print "yn:",yn

然后我得到了这些结果：

x: [[ 10424.20322635  10454.76060099  10485.45730949 ..., -12968.67726035
-12685.76602721 -12375.06502138]
[ 10382.59291927  10412.4034849   10442.35640928 ..., -12992.35321415
-12700.8632597  -12380.48805381]
[ 10340.74366218  10369.79366321  10398.98895233 ..., -13017.45507334
-12716.86098332 -12386.19350493]
..., 
[  5327.05493943   5275.15394042   5223.90854331 ...,   1918.57476975
1821.32106295   1717.34665908]
[  5303.06157859   5251.14693111   5199.89936454 ...,   1914.50352498
1818.19581363   1715.23546366]
[  5280.12577523   5226.55972784   5176.11746996 ...,   1910.4792526
1815.09866674   1714.77978295]]
x.shape: (135, 90)
y: [[ 8049.59989276  8099.28303285  8147.42741851 ...,  9925.58168202
9933.46845934  9937.89861612]
[ 8056.91586464  8106.78261584  8155.11136874 ...,  9953.01973235
9961.14109569  9965.68870206]
[ 8064.04624932  8114.09204498  8162.60060337 ...,  9980.50394667
9988.87543224  9993.54921283]
..., 
[ 7258.03197692  7292.42166577  7325.40914928 ...,  8225.26655004
8228.18675519  8230.16218915]
[ 7242.59306102  7276.75919255  7309.52794297 ...,  8201.49165135
8204.39528226  8206.36728948]
[ 7226.54007095  7261.56601577  7293.59601515 ...,  8177.75663252
8180.64399766  8182.58727191]]
y.shape: (135, 90)
X: [[-14149 -14148 -14147 ...,  14166  14167  14168]
[-14149 -14148 -14147 ...,  14166  14167  14168]
[-14149 -14148 -14147 ...,  14166  14167  14168]
..., 
[-14149 -14148 -14147 ...,  14166  14167  14168]
[-14149 -14148 -14147 ...,  14166  14167  14168]
[-14149 -14148 -14147 ...,  14166  14167  14168]]
X.shape (3635, 28318)
Y: [[ 7227  7227  7227 ...,  7227  7227  7227]
[ 7228  7228  7228 ...,  7228  7228  7228]
[ 7229  7229  7229 ...,  7229  7229  7229]
..., 
[10859 10859 10859 ..., 10859 10859 10859]
[10860 10860 10860 ..., 10860 10860 10860]
[10861 10861 10861 ..., 10861 10861 10861]]
Y.shape (3635, 28318)
x0: -14149
xn: 14168
y0: 7227
yn: 10861

数据处理 numpy 算法优化经纬度内存错误近似最近邻 hdf文件列堆叠

有没有与numpy中“列堆叠”相同的函数？

2 个回答

撰写回答