为什么我的Azure Container Instance上的ML模型部署仍然失败?

2024-04-26 00:50:11 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用Azure机器学习服务将ML模型部署为web服务。在

registered a ^{},现在想将其部署为the guide中的aciweb服务。在

我来定义一下

from azureml.core.webservice import Webservice, AciWebservice
from azureml.core.image import ContainerImage

aciconfig = AciWebservice.deploy_configuration(cpu_cores=4, 
                      memory_gb=32, 
                      tags={"data": "text",  "method" : "NB"}, 
                      description='Predict something')

以及

^{pr2}$

创造一个形象

image = ContainerImage.create(name = "scorer-image",
                      models = [model],
                      image_config = image_config,
                      workspace = ws
                      )

图像创建成功

Creating image Image creation operation finished for image scorer-image:5, operation "Succeeded"

另外,通过在Azure虚拟机上本地运行映像来进行故障排除

sudo docker run -p 8002:5001 myscorer0588419434.azurecr.io/scorer-image:5

允许我对http://localhost:8002/score成功运行(本地)查询。在

但是,部署

service_name = 'scorer-svc'
service = Webservice.deploy_from_image(deployment_config = aciconfig,
                                        image = image,
                                        name = service_name,
                                        workspace = ws)

失败的原因

Creating service
Running.
FailedACI service creation operation finished, operation "Failed"
Service creation polling reached terminal state, current service state: Transitioning
Service creation polling reached terminal state, unexpected response received. Transitioning

我尝试在aciconfig中设置更慷慨的memory_gb,但没有成功:部署保持在转换状态(如下图所示,如果在Azure门户上被监视的话): enter image description here

另外,运行service.get_logs()可以让我

WebserviceException: Received bad response from Model Management Service: Response Code: 404

凶手可能是什么?


Tags: namefromimagewebconfig部署serviceazure
1条回答
网友
1楼 · 发布于 2024-04-26 00:50:11

如果ACI部署失败,一种解决方案是尝试分配更少的资源,例如

aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, 
                  memory_gb=8, 
                  tags={"data": "text",  "method" : "NB"}, 
                  description='Predict something')

虽然抛出的错误消息不是特别有用,但实际上在documentation中明确说明了这一点:

When a region is under heavy load, you may experience a failure when deploying instances. To mitigate such a deployment failure, try deploying instances with lower resource settings [...]

文档还说明了不同区域中可用的CPU/RAM资源的最大值(在编写本文时,由于资源不足,需要使用memory_gb=32进行部署可能会在所有区域失败)。在

当需要较少的资源时,部署应该成功

Creating service
Running......................................................
SucceededACI service creation operation finished, operation
"Succeeded" Healthy

相关问题 更多 >