ML引擎慢推理时间

2021-05-13 14:19:35 发布

您现在位置:Python中文网/ 问答频道 /正文

从Google云文档

Cloud Machine Learning Engine online prediction is a service optimized to run your data through hosted models with as little latency as possible. You send small batches of data to the service and it returns your predictions in the response. You can learn more general information about online prediction with the other prediction concepts. Blockquote

https://cloud.google.com/ml-engine/docs/tensorflow/online-predict

我必须说,我在这方面的经验还不够。别误会我,我喜欢ML引擎,我在那里做了大量的训练,但以我的经验来看,推理时间是荒谬的。在

我在256x256大小的图像上做了一个小的语义分割网络。我使用估计器,将模型转换为SavedModel格式,然后使用此代码进行预测

predictor_fn = tf.contrib.predictor.from_saved_model(
    export_dir=saved_model_dir,
    signature_def_key="prediction"
)

在我的电脑上,这给了我大约0.9秒的推理时间。我在Datalab实例上尝试了相同的代码,机器类型为n1-highmem-2,这给了我大约2.3秒的推断时间。在

现在,如果我把保存的模型放在ML引擎上

^{pr2}$

服务输入函数如下:

def parse_incoming_tensors(incoming):
    img = vgg16_normalize(tf.reshape(incoming, [-1, 256, 256, 3]))
    return img


def serving_input_fn_web():
    """Input function to use when serving the model on ML Engine."""
    inputs = tf.placeholder(tf.string, shape=(None, ))
    feature_input = batch_base64_to_tensor(inputs)
    feature_input = {'img': parse_incoming_tensors(feature_input)}

    return tf.estimator.export.ServingInputReceiver(feature_input, inputs)

然后,我用这个类做预测:

class MakeQuery():
    def __init__(self, project, model, version=None, client_secret=None):
        # Set the environment variable
        secret_path = os.path.abspath('./client_secret.json')
        os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = secret_path

        # Create service object and prepare the name of the model we are going to query
        self._service = googleapiclient.discovery.build('ml', 'v1')
        self._name = 'projects/{}/models/{}'.format(project, model)
        self._project = self._service.projects()

        if version is not None:
            self._name += '/versions/{}'.format(version)

    def predict(self, instances):
        response = self._project.predict(name=self._name, body=instances).execute()

        if 'error' in response:
            raise RuntimeError(response['error'])

        return response

    def get_prediction(self, img_arrays):
        input_list = [{'input': array_to_base64_websafe_resize(img)} for img in img_arrays]
        input_format = {'instances': input_list}

        return self.predict(input_format)

然后每个预测请求需要10秒!每次我提出请求时,ML引擎都会认真部署模型吗?在

TL;DR 使用估计器的Tensorflow SavedModel格式,googleml引擎上的推理时间非常慢。在

  • 雷泽2017:0.9秒
  • 数据实验室n1-highmem-2:2.3秒
  • ML发动机10秒