有没有方法可以序列化scipy.interpolate.Rbf()对象？

5 投票

2 回答

2724 浏览

提问于 2025-04-18 08:17

我正在为一个相当大的数据集创建一个径向基函数插值模型。主要的调用 `scipy.interpolate.Rbf(,)` 大约需要一分钟和 14 GB 的内存。因为并不是每台机器都能处理这个，所以我的程序会经常在同一个数据集上运行，我想把结果保存到一个文件里。下面是一个简化的例子：

import scipy.interpolate as inter
import numpy as np
import cPickle

x = np.array([[1,2,3],[3,4,5],[7,8,9],[1,5,9]])
y = np.array([1,2,3,4])

rbfi = inter.Rbf(x[:,0], x[:,1], x[:,2], y)

RBFfile = open('picklefile','wb')
RBFpickler = cPickle.Pickler(RBFfile,protocol=2)
RBFpickler.dump(rbfi)
RBFfile.close()

调用 RBFpickler.dump() 时出现了 can't pickle <type 'instancemethod'> 的错误。根据我的理解，这意味着里面有一个方法（其实 rbfi() 是可以调用的），但出于某种我不太明白的原因，这个方法不能被保存。

有没有人知道有没有其他方法可以保存这个，或者以其他方式保存 inter.Rbf() 调用的结果？

里面有一些形状为 (nd,n) 和 (n,n) 的数组（rbfi.A, rbfi.xi, rbfi.di...），我猜这些数组存储了所有有用的信息。我想我可以只保存这些数组，但我不太确定怎么把对象再组装起来...

编辑：还有一个额外的限制：我不能在系统上安装额外的库。唯一能用的方式是如果它们是纯 Python 的，我可以直接把它们包含在脚本里，而不需要编译任何东西。

内存管理数据存储方法调用序列化数组处理纯python 径向基函数插值模型

2 个回答

好的，Mike的解决方案看起来不错，但我在这段时间找到了另一个方法：

一个Rbf对象中只有两个部分不能直接保存，但这些部分很容易从头开始重新创建。因此，我的代码现在只保存数据部分：

import scipy.interpolate as inter
import numpy as np
import cPickle

x = np.array([[1,2,3],[3,4,5],[7,8,9],[1,5,9]])
y = np.array([1,2,3,4])

rbfi = inter.Rbf(x[:,0], x[:,1], x[:,2], y)

RBFfile = open('picklefile','wb')
RBFpickler = cPickle.Pickler(RBFfile,protocol=2)

# RBF can't be pickled directly, so save everything required for reconstruction
RBFdict = {}            
for key in rbfi.__dict__.keys():
    if key != '_function' and key!= 'norm':
        RBFdict[key] = rbfi.__getattribute__(key)   

RBFpickler.dump(RBFdict)
RBFfile.close()

这样我就得到了一个文件，里面包含了对象中存储的所有信息。rbfi._function()和rbfi.norm不会被保存。幸运的是，只需初始化一个（随便简单的）Rbf对象，就可以从头开始重新创建它们：

## create a bare-bones RBF object ##
rbfi = inter.Rbf(np.array([1,2,3]), np.array([10,20,30]), \
                      np.array([1,2,3]), function = RBFdict['function'] )

然后，这个对象的数据部分会用保存的数据替换掉：

RBFfile = open('picklefile','rb')
RBFunpickler = cPickle.Unpickler(RBFfile)
RBFdict = RBFunpickler.load()
RBFfile.close()

## replace rbfi's contents with what was saved ##
for key,value in RBFdict.iteritems():
    rbfi.__setattr__(key, value)

>>> rbfi(2,3,4)
array(1.4600661386382146)

显然，给新的Rbf对象设置与原始对象相同的维度并不是必须的，因为所有这些都会被覆盖。

不过，Mike的解决方案可能更普遍适用，而这个方法则更独立于平台。我在不同平台之间移动已保存的Kriging模型时遇到过问题，但这个RBF模型的方法似乎更稳健——不过我还没有进行太多测试，所以不能保证。

回答于 2025-04-18 由 Python大师

分享举报

我会用 dill 来保存结果……或者如果你想要一个缓存的函数，可以用 klepto 来缓存函数的调用，这样就能减少对函数的重复计算。

Python 2.7.6 (default, Nov 12 2013, 13:26:39) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import scipy.interpolate as inter
>>> import numpy as np
>>> import dill
>>> import klepto
>>> 
>>> x = np.array([[1,2,3],[3,4,5],[7,8,9],[1,5,9]])
>>> y = np.array([1,2,3,4])
>>> 
>>> # build an on-disk archive for numpy arrays,
>>> # with a dictionary-style interface  
>>> p = klepto.archives.dir_archive(serialized=True, fast=True)
>>> # add a caching algorithm, so when threshold is hit,
>>> # memory is dumped to disk
>>> c = klepto.safe.lru_cache(cache=p)
>>> # decorate the target function with the cache
>>> c(inter.Rbf)
<function Rbf at 0x104248668>
>>> rbf = _
>>> 
>>> # 'rbf' is now cached, so all repeat calls are looked up
>>> # from disk or memory
>>> d = rbf(x[:,0], x[:,1], x[:,2], y)
>>> d
<scipy.interpolate.rbf.Rbf object at 0x1042454d0>
>>> d.A
array([[ 1.        ,  1.22905719,  2.36542472,  1.70724365],
       [ 1.22905719,  1.        ,  1.74422655,  1.37605151],
       [ 2.36542472,  1.74422655,  1.        ,  1.70724365],
       [ 1.70724365,  1.37605151,  1.70724365,  1.        ]])
>>>

继续……

>>> # the cache is serializing the result object behind the scenes
>>> # it also works if we directly pickle and unpickle it
>>> _d = dill.loads(dill.dumps(d))
>>> _d
<scipy.interpolate.rbf.Rbf object at 0x104245510>
>>> _d.A
array([[ 1.        ,  1.22905719,  2.36542472,  1.70724365],
       [ 1.22905719,  1.        ,  1.74422655,  1.37605151],
       [ 2.36542472,  1.74422655,  1.        ,  1.70724365],
       [ 1.70724365,  1.37605151,  1.70724365,  1.        ]])
>>>

在这里获取 klepto 和 dill： https://github.com/uqfoundation

回答于 2025-04-18 由 Python大师

分享举报

有没有方法可以序列化scipy.interpolate.Rbf()对象？

2 个回答

撰写回答