如何在Python中保存KDTree对象?
我正在使用Scipy的KDTree实现来读取一个300MB的大文件。请问有没有办法把这个数据结构保存到磁盘上,然后再加载回来?还是说我每次启动程序时都得从文件中读取原始数据点,然后重新构建这个数据结构?我构建KDTree的代码如下:
def buildKDTree(self):
self.kdpoints = numpy.fromfile("All", sep=' ')
self.kdpoints.shape = self.kdpoints.size / self.NDIM, NDIM
self.kdtree = KDTree(self.kdpoints, leafsize = self.kdpoints.shape[0]+1)
print "Preparing KDTree... Ready!"
有什么建议吗?
1 个回答
14
KD树使用嵌套类来定义它的节点类型(内部节点和叶子节点)。但是,Pickle这个工具只能处理模块级别的类定义,所以嵌套类会让它出问题:
import cPickle
class Foo(object):
class Bar(object):
pass
obj = Foo.Bar()
print obj.__class__
cPickle.dumps(obj)
<class '__main__.Bar'>
cPickle.PicklingError: Can't pickle <class '__main__.Bar'>: attribute lookup __main__.Bar failed
不过,有一种(比较绕的)解决办法,就是通过修改类定义,把它们放到scipy.spatial.kdtree
这个模块的范围内,这样Pickle就能找到它们了。如果你所有读取和写入被序列化的KD树对象的代码都安装了这些修改,这个方法应该就能正常工作:
import cPickle
import numpy
from scipy.spatial import kdtree
# patch module-level attribute to enable pickle to work
kdtree.node = kdtree.KDTree.node
kdtree.leafnode = kdtree.KDTree.leafnode
kdtree.innernode = kdtree.KDTree.innernode
x, y = numpy.mgrid[0:5, 2:8]
t1 = kdtree.KDTree(zip(x.ravel(), y.ravel()))
r1 = t1.query([3.4, 4.1])
raw = cPickle.dumps(t1)
# read in the pickled tree
t2 = cPickle.loads(raw)
r2 = t2.query([3.4, 4.1])
print t1.tree.__class__
print repr(raw)[:70]
print t1.data[r1[1]], t2.data[r2[1]]
输出:
<class 'scipy.spatial.kdtree.innernode'>
"ccopy_reg\n_reconstructor\np1\n(cscipy.spatial.kdtree\nKDTree\np2\nc_
[3 4] [3 4]