Vercel部署时Scikit-learn导入错误

-3 投票
2 回答
244 浏览
提问于 2025-04-14 16:11

我正在把一个Flask聊天机器人后端部署到Vercel上。这个项目使用了scikit-learn(也叫sklearn)来训练模型,但在聊天机器人应用中其实并不需要它。

在部署的时候,我遇到了一个错误,提示内容是:

LAMBDA_WARNING: Unhandled exception. The most likely cause is an issue in the function code. However, in rare cases, a Lambda runtime update can cause unexpected function behavior. For functions using managed runtimes, runtime updates can be triggered by a function change, or can be applied automatically. To determine if the runtime has been updated, check the runtime version in the INIT_START log entry. If this error correlates with a change in the runtime version, you may be able to mitigate this error by temporarily rolling back to the previous runtime version. For more information, see https://docs.aws.amazon.com/lambda/latest/dg/runtimes-update.html [ERROR] 
Runtime.ImportModuleError: Unable to import module 'vc__handler__python': No module named 'sklearn' Traceback (most recent call last): 

我在网上搜索了解决方案,但没有找到针对这种情况的具体修复方法。

代码片段:

from flask import Flask, request, jsonify, render_template
from flask_cors import CORS
import json
import random
import wikipedia
import re
import time
import numpy as np
import nltk
from nltk.stem import WordNetLemmatizer
import joblib

app = Flask(__name__)
CORS(app)
# Load intents data from JSON file
intents = json.loads(open('intents.json').read())

# Load preprocessed data
words = joblib.load('words.pkl')
classes = joblib.load('classes.pkl')
nb_classifier = joblib.load('nb_classifier.joblib')

lemmatizer = WordNetLemmatizer()

# Function to clean up a sentence by tokenizing and lemmatizing its words
def clean_up_sentence(sentence):
    sentence_words = nltk.word_tokenize(sentence)
    sentence_words = [lemmatizer.lemmatize(word.lower()) for word in sentence_words]
    return sentence_words

# Function to convert a sentence into bag of words representation
def bow(sentence, words):
    sentence_words = clean_up_sentence(sentence)
    bag = [1 if lemmatizer.lemmatize(word.lower()) in sentence_words else 0 for word in words]
    return np.array(bag)

# Function to predict the intent class of a given sentence
def predict_class(sentence):
    p = bow(sentence, words)
    res = nb_classifier.predict(np.array([p]))[0]
    return_list = [{"intent": classes[res], "probability": "1"}]
    return return_list

# Function to get a response based on predicted intent
def get_response(ints, intents_json):
    tag = ints[0]['intent']
    list_of_intents = intents_json['intents']
    for i in list_of_intents:
        if i['tag'] == tag:
            result = random.choice(i['responses'])
            break
    return result

# Function to extract subject from a question
def extract_subject(question):
    punctuation_marks = ['.', ',', '!', '?', ':', ';', "'", '"', '(', ')', '[', ']', '-', '—', '...', '/', '\\', '&', '*', '%', '$', '#', '@', '+', '-', '=', '<', '>', '_', '|', '~', '^']
    for punctuation_mark in punctuation_marks:
        if punctuation_mark in question:
            question = question.replace(punctuation_mark, '')
    
    subject = ''
    words = question.split(' ')
    list_size = len(words)

    for i in range(list_size):
        if i > 1 and i != list_size:
            subject += words[i]+' '
        elif i == list_size:
            subject += words[i]
    return subject

# Function to clean text by removing characters within parentheses
def clean_text(text):
    cleaned_text = re.sub(r'\([^()]*\)', '', text)
    cleaned_text = cleaned_text.strip()
    return cleaned_text

# Function to search Wikipedia for information based on a question
def search_wikipedia(question, num_sentences=2):
    try:
        subject = extract_subject(question)
        wiki_result = wikipedia.summary(subject, auto_suggest=False, sentences=num_sentences)
        return clean_text(wiki_result)
    except wikipedia.exceptions.PageError:
        return f"Sorry, I couldn't find information about {subject}."
    except wikipedia.exceptions.DisambiguationError as e:
        return f"Multiple matches found. Try being more specific: {', '.join(e.options)}"
    except Exception as e:
        return "Error, Something went wrong!"

# Function to get a response from the chatbot
def chatbot_response(text):
    ints = predict_class(text)
    res = get_response(ints, intents)
    return res

@app.route('/chat', methods=['POST'])
def chat():
    user_text = request.form['user_input']
    bot_response = chatbot_response(user_text)
    return jsonify({'response': bot_response})

if __name__ == '__main__':
    app.run(debug=True)

requirements.txt

Flask==3.0.2
Flask-Cors==4.0.0
joblib==1.3.2
nltk==3.8.1
numpy==1.26.3
wikipedia==1.4.0

项目结构:

│   .gitignore
│   README.md
│   requirements.txt
│   vercel.json
└───api
        classes.pkl
        index.py
        intents.json
        nb_classifier.joblib
        words.pkl

请帮我找到解决办法。

2 个回答

5

根据错误信息的提示

[ERROR] Runtime.ImportModuleError: Unable to import module
'vc__handler__python': No module named 'sklearn' 
Traceback (most recent call last):

而且你使用了很多第三方包,很可能你需要使用一个部署包,但现在却没有这样做!

下面是制作部署包的步骤:

  1. 在你的项目文件夹里新建一个叫 package 的空文件夹。
  2. 创建一个 requirements.txt 文件,里面列出你函数所需的所有依赖包。虽然你的 Flask 代码看起来不需要 sklearn,但一定要把你在 Lambda 函数中引用的所有代码的依赖都列出来。
  3. 通过运行以下 pip 命令,把依赖包直接安装到你的部署包里:
    pip install --target ./package/ --requirement requirements.txt
    
  4. 把你的 lambda_function.py 文件放到这个文件夹里。
  5. 把这个新文件夹压缩成一个 zip 文件,比如:
    zip -r ./my_deployment_package.zip ./package/ # don't forget the -r!
    
  6. 在 Lambda 函数页面的 代码源 部分上传你的部署包(如果遇到问题,可以查看官方说明 这里)。

想了解更多细节和其他方法,可以查看 这个链接

0

我查看了 nb_classifier.joblib 这个文件,发现它是用 sklearn/scikit-learn 这个模块制作的。

所以我直接用一个新创建的文件替换了它,这个新文件是用以下代码训练出来的:

这段代码:

import random
import json
import numpy as np
import nltk
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize
from nltk.classify import NaiveBayesClassifier
import joblib

lemmatizer = WordNetLemmatizer()

# Load intents data
intents = json.loads(open('intents.json').read())

words = []
classes = []
documents = []
ignore_letters = ['?', '!', '.', ',']

# Extract words and classes from intents
for intent in intents['intents']:
    for pattern in intent['patterns']:
        word_list = word_tokenize(pattern)
        words.extend(word_list)
        documents.append((word_list, intent['tag']))

        if intent['tag'] not in classes:
            classes.append(intent['tag'])

# Lemmatize words and remove ignored characters
words = [lemmatizer.lemmatize(word.lower()) for word in words if word not in ignore_letters]
words = sorted(set(words))
classes = sorted(set(classes))

# Define a function to extract features
def extract_features(document):
    document_words = set(document)
    features = {}
    for word in words:
        features[word] = (word in document_words)
    return features

# Prepare training data
training_set = [(extract_features(doc), tag) for doc, tag in documents]

# Train a Naive Bayes classifier
nb_classifier = NaiveBayesClassifier.train(training_set)

# Save the model and associated files
joblib.dump(words, 'words2.pkl')
joblib.dump(classes, 'classes2.pkl')
joblib.dump(nb_classifier, 'nb_classifier2.joblib')

print('Done')

这样就不需要 sklearn 模块了。

撰写回答