AWS Sagemaker MultiModel 端点的额外依赖项

1 投票

1 回答

28 浏览

数据工程师

提问于 2025-04-12 16:27

我正在尝试在 AWS SageMaker 上部署一个多模型端点。不过，我的一些模型有额外的依赖项。我正在按照 Hugging Face 的文档来创建用户自定义代码和需求。

我的压缩文件里有一个 code 目录，里面有 requirements.txt 文件，但当我部署模型并尝试用 Python 的 AWS SDK 调用它时，却出现了 ModuleNotFound 的错误，提示找不到我导入的模块。

我知道它能找到我的 inference.py 文件，因为它在找不到我导入的那些模块时出错。

需要注意的是，我正在部署的这些模型是在 SageMaker 之外训练和制作的，我想把它们引入 SageMaker。

我使用的容器镜像是 '763104351884.dkr.ecr.ca-central-1.amazonaws.com/huggingface-pytorch-inference:2.1.0-transformers4.37.0-cpu-py310-ubuntu22.04'

dependency management hugging face aws sagemaker multi-model endpoints model deployment custom code inference container image

1 个回答

嘿，Lucas，

我觉得你可能把两种不同的方法搞混了，都是用来在SageMaker上部署模型的。

如果你想创建一个多模型端点，那么很遗憾，你需要创建一个符合SageMaker要求的Docker镜像（比如需要开放哪些端口等等）。你可以在这里了解更多信息。

你正在参考的HuggingFace指南是为单模型端点设计的，确实可以让你使用自定义依赖项。你可以考虑为你的所有模型创建单模型端点，按照以下步骤进行：

使用git从Hugging Face克隆模型
在模型目录下创建一个code/文件夹，并添加一个inference.py文件
在推理文件中包含两个函数，这两个函数必须分别叫做model_fn()和predict_fn()。前者在端点初始化时使用，必须返回模型和分词器，后者则是在每次推理请求时调用。你可以在predict_fn()中加入自定义逻辑。
创建一个包含所有模型文件的压缩包（model.tar.gz），包括你的自定义推理代码。格式应该如下。

model.tar.gz/
|- pytorch_model.bin
|- ....
|- code/
  |- inference.py
  |- requirements.txt

最后，把这个压缩包上传到S3，并在创建模型/端点时将S3的URI传给SageMaker。

Hugging Face有一个很棒的笔记本，涵盖了整个过程。这是我找到的最好的指南。如果你逐字复制它，只修改inference.py脚本，你应该能成功。

这里有一个我之前用过的inference.py的例子，正如你所看到的，Hugging Face的管道也可以工作！

from transformers import pipeline, AutoTokenizer, AutoModelForTokenClassification
from DirectQuoteUtils import reformat
import torch
import os

def model_fn(model_dir):
    tokenizer = AutoTokenizer.from_pretrained(model_dir)
    model = AutoModelForTokenClassification.from_pretrained(model_dir)
    pipe = pipeline("ner", model=model, tokenizer=tokenizer)
    return pipe

def predict_fn(data, pipeline):
    pipe = pipeline
    outputs = []
    
    # FORMAT FOR MODEL INPUT:
    # {               # list of strings
    #     "inputs": ["Donald Trump is the president of the US", "Joe Biden is the United States president"]
    # }
    
    modelData = pipe(data['inputs'])
    
    for prediction in modelData:
        cleanPred = reformat(prediction)
        outputs.append(cleanPred)
        
    return {
        # "device": device, # handy to check if CUDA is being used
        "outputs": outputs
    }

回答于 2025-04-12 由 Python大师

分享举报

AWS Sagemaker MultiModel 端点的额外依赖项

1 个回答

撰写回答