Vercel部署时Scikit-learn导入错误
我正在把一个Flask聊天机器人后端部署到Vercel上。这个项目使用了scikit-learn(也叫sklearn)来训练模型,但在聊天机器人应用中其实并不需要它。
在部署的时候,我遇到了一个错误,提示内容是:
LAMBDA_WARNING: Unhandled exception. The most likely cause is an issue in the function code. However, in rare cases, a Lambda runtime update can cause unexpected function behavior. For functions using managed runtimes, runtime updates can be triggered by a function change, or can be applied automatically. To determine if the runtime has been updated, check the runtime version in the INIT_START log entry. If this error correlates with a change in the runtime version, you may be able to mitigate this error by temporarily rolling back to the previous runtime version. For more information, see https://docs.aws.amazon.com/lambda/latest/dg/runtimes-update.html [ERROR]
Runtime.ImportModuleError: Unable to import module 'vc__handler__python': No module named 'sklearn' Traceback (most recent call last):
我在网上搜索了解决方案,但没有找到针对这种情况的具体修复方法。
代码片段:
from flask import Flask, request, jsonify, render_template
from flask_cors import CORS
import json
import random
import wikipedia
import re
import time
import numpy as np
import nltk
from nltk.stem import WordNetLemmatizer
import joblib
app = Flask(__name__)
CORS(app)
# Load intents data from JSON file
intents = json.loads(open('intents.json').read())
# Load preprocessed data
words = joblib.load('words.pkl')
classes = joblib.load('classes.pkl')
nb_classifier = joblib.load('nb_classifier.joblib')
lemmatizer = WordNetLemmatizer()
# Function to clean up a sentence by tokenizing and lemmatizing its words
def clean_up_sentence(sentence):
sentence_words = nltk.word_tokenize(sentence)
sentence_words = [lemmatizer.lemmatize(word.lower()) for word in sentence_words]
return sentence_words
# Function to convert a sentence into bag of words representation
def bow(sentence, words):
sentence_words = clean_up_sentence(sentence)
bag = [1 if lemmatizer.lemmatize(word.lower()) in sentence_words else 0 for word in words]
return np.array(bag)
# Function to predict the intent class of a given sentence
def predict_class(sentence):
p = bow(sentence, words)
res = nb_classifier.predict(np.array([p]))[0]
return_list = [{"intent": classes[res], "probability": "1"}]
return return_list
# Function to get a response based on predicted intent
def get_response(ints, intents_json):
tag = ints[0]['intent']
list_of_intents = intents_json['intents']
for i in list_of_intents:
if i['tag'] == tag:
result = random.choice(i['responses'])
break
return result
# Function to extract subject from a question
def extract_subject(question):
punctuation_marks = ['.', ',', '!', '?', ':', ';', "'", '"', '(', ')', '[', ']', '-', '—', '...', '/', '\\', '&', '*', '%', '$', '#', '@', '+', '-', '=', '<', '>', '_', '|', '~', '^']
for punctuation_mark in punctuation_marks:
if punctuation_mark in question:
question = question.replace(punctuation_mark, '')
subject = ''
words = question.split(' ')
list_size = len(words)
for i in range(list_size):
if i > 1 and i != list_size:
subject += words[i]+' '
elif i == list_size:
subject += words[i]
return subject
# Function to clean text by removing characters within parentheses
def clean_text(text):
cleaned_text = re.sub(r'\([^()]*\)', '', text)
cleaned_text = cleaned_text.strip()
return cleaned_text
# Function to search Wikipedia for information based on a question
def search_wikipedia(question, num_sentences=2):
try:
subject = extract_subject(question)
wiki_result = wikipedia.summary(subject, auto_suggest=False, sentences=num_sentences)
return clean_text(wiki_result)
except wikipedia.exceptions.PageError:
return f"Sorry, I couldn't find information about {subject}."
except wikipedia.exceptions.DisambiguationError as e:
return f"Multiple matches found. Try being more specific: {', '.join(e.options)}"
except Exception as e:
return "Error, Something went wrong!"
# Function to get a response from the chatbot
def chatbot_response(text):
ints = predict_class(text)
res = get_response(ints, intents)
return res
@app.route('/chat', methods=['POST'])
def chat():
user_text = request.form['user_input']
bot_response = chatbot_response(user_text)
return jsonify({'response': bot_response})
if __name__ == '__main__':
app.run(debug=True)
requirements.txt
Flask==3.0.2
Flask-Cors==4.0.0
joblib==1.3.2
nltk==3.8.1
numpy==1.26.3
wikipedia==1.4.0
项目结构:
│ .gitignore
│ README.md
│ requirements.txt
│ vercel.json
└───api
classes.pkl
index.py
intents.json
nb_classifier.joblib
words.pkl
请帮我找到解决办法。
2 个回答
5
根据错误信息的提示
[ERROR] Runtime.ImportModuleError: Unable to import module
'vc__handler__python': No module named 'sklearn'
Traceback (most recent call last):
而且你使用了很多第三方包,很可能你需要使用一个部署包,但现在却没有这样做!
下面是制作部署包的步骤:
- 在你的项目文件夹里新建一个叫
package
的空文件夹。 - 创建一个
requirements.txt
文件,里面列出你函数所需的所有依赖包。虽然你的 Flask 代码看起来不需要sklearn
,但一定要把你在 Lambda 函数中引用的所有代码的依赖都列出来。 - 通过运行以下 pip 命令,把依赖包直接安装到你的部署包里:
pip install --target ./package/ --requirement requirements.txt
- 把你的 lambda_function.py 文件放到这个文件夹里。
- 把这个新文件夹压缩成一个 zip 文件,比如:
zip -r ./my_deployment_package.zip ./package/ # don't forget the -r!
- 在 Lambda 函数页面的
代码源
部分上传你的部署包(如果遇到问题,可以查看官方说明 这里)。
想了解更多细节和其他方法,可以查看 这个链接。
0
我查看了 nb_classifier.joblib
这个文件,发现它是用 sklearn/scikit-learn 这个模块制作的。
所以我直接用一个新创建的文件替换了它,这个新文件是用以下代码训练出来的:
这段代码:
import random
import json
import numpy as np
import nltk
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize
from nltk.classify import NaiveBayesClassifier
import joblib
lemmatizer = WordNetLemmatizer()
# Load intents data
intents = json.loads(open('intents.json').read())
words = []
classes = []
documents = []
ignore_letters = ['?', '!', '.', ',']
# Extract words and classes from intents
for intent in intents['intents']:
for pattern in intent['patterns']:
word_list = word_tokenize(pattern)
words.extend(word_list)
documents.append((word_list, intent['tag']))
if intent['tag'] not in classes:
classes.append(intent['tag'])
# Lemmatize words and remove ignored characters
words = [lemmatizer.lemmatize(word.lower()) for word in words if word not in ignore_letters]
words = sorted(set(words))
classes = sorted(set(classes))
# Define a function to extract features
def extract_features(document):
document_words = set(document)
features = {}
for word in words:
features[word] = (word in document_words)
return features
# Prepare training data
training_set = [(extract_features(doc), tag) for doc, tag in documents]
# Train a Naive Bayes classifier
nb_classifier = NaiveBayesClassifier.train(training_set)
# Save the model and associated files
joblib.dump(words, 'words2.pkl')
joblib.dump(classes, 'classes2.pkl')
joblib.dump(nb_classifier, 'nb_classifier2.joblib')
print('Done')
这样就不需要 sklearn 模块了。