如何在MongoDB中检索和应用与GridFS一起保存的Catboost二进制模型文件

2024-04-19 00:51:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我能够使用具有以下语法的GridFS在MongoDB中保存大型Catboost模型:

# Export and save model as a file
model.save_model(file_path_model, format='cbm')

# Load exported file and insert in Mongo
with open(file_path_model, mode='rb') as file:  # b is important -> binary
    file_content_model = file.read()
binary_model = Binary(file_content_model)

# Get the gridfs object (will be save in fs.chunks)
fs = gridfs.GridFS(self.database)

# Store binary file in Mongo DB using grid_fs
new_id = fs.put(binary_model)

我还可以使用GridFS objectid从MongoDB检索二进制文件:

db = modelDBStorageManager.database
fs = gridfs.GridFS(db)
bin_model = fs.get( ObjectId(document['_id'])).read()

但是我想做的是转换检索二进制模型,这样我就可以在一些数据上重新应用它

我试图保存模型并使用.load_model()Catboost函数加载它:

# Saving the model
def save_binary_file(bin_model):
    model1 = str(bin_model)
    fo = open("./Catboost_binary_files/binary.cbm", "w")
    fo.write(model1)
    fo.close()

save_binary_file(bin_model)

# Trying to load back the model
from_file = CatBoostClassifier()
model = from_file.load_model("./Catboost_binary_files/binary.cbm", format = 'cbm')

我得到以下错误:

---------------------------------------------------------------------------
CatBoostError                             Traceback (most recent call last)
<ipython-input-21-35e2109c72ed> in <module>
      1 from_file = CatBoostClassifier()
      2 
----> 3 model = from_file.load_model("./Catboost_binary_files/binary.cbm", format = 'cbm')

~/opt/anaconda2/envs/fsbo-fraud-catboost-py37/lib/python3.7/site-packages/catboost/core.py in load_model(self, fname, format)
   2587         if not isinstance(fname, STRING_TYPES):
   2588             raise CatBoostError("Invalid fname type={}: must be str().".format(type(fname)))
-> 2589         self._load_model(fname, format)
   2590         return self
   2591 

~/opt/anaconda2/envs/fsbo-fraud-catboost-py37/lib/python3.7/site-packages/catboost/core.py in _load_model(self, model_file, format)
   1313 
   1314     def _load_model(self, model_file, format):
-> 1315         self._object._load_model(model_file, format)
   1316         self._set_trained_model_attributes()
   1317         for key, value in iteritems(self._get_params()):

_catboost.pyx in _catboost._CatBoost._load_model()

_catboost.pyx in _catboost._CatBoost._load_model()

CatBoostError: catboost/libs/model/model.cpp:648: Incorrect model file descriptor

文件格式似乎有问题


Tags: inselfformatmodelbinsaveloadfs