如何防止Flask和NLTK的内存泄漏

0 投票

1 回答

2489 浏览

提问于 2025-04-17 17:39

我正在用NLTK和Flask构建一个网页应用。这只是一个简单的RESTful应用，我把它部署在了Heroku上，一切都很顺利。不过，当服务器开始接收到更多请求时，我达到了Heroku的内存限制，只有1.5GB。所以，我猜可能是因为我每次请求到来时都在加载nltk.RegexpParser。

这是代码，非常简单。



@app.route('/get_keywords', methods=['POST'])
def get_keywords():
    data_json = json.loads(request.data)
    text = urllib.unquote(data_json["sentence"])
    keywords = KeywordExtraction().extract(text)

    return ','.join(keywords)

这是关键词提取的部分。


import re
import nltk

nltk.data.path.append('./nltk_data/')

from nltk.corpus import stopwords

class KeywordExtraction:
    def extract(self, text):

        sentences = nltk.sent_tokenize(text)
        sentences = [nltk.word_tokenize(sent) for sent in sentences]
        sentences = [nltk.pos_tag(sent) for sent in sentences]

        grammar = "NP: {}"
        cp = nltk.RegexpParser(grammar)
        tree = cp.parse(sentences[0])

        keywords = [subtree.leaves()[0][0] for subtree in tree.subtrees(filter=lambda t: t.node == 'NP')]
        keywords_without_stopwords = [w for w in keywords if not w in stopwords.words('english')]

        return list(set(keywords_without_stopwords + tags))

我不确定问题出在我的代码、Flask还是NLTK上。我对Python还很陌生。任何建议都非常感谢。

我用blitz.io测试过，结果在仅仅250个请求后，服务器就崩溃了，开始出现R15错误。

内存泄漏 web应用服务器崩溃 nltk flask heroku 关键词提取 restful

1 个回答

首先，开始进行缓存：

# Move these outside of the class declaration or make them class variables

stopwords = set(stopwords.words('english'))
grammar = "NP: {}"
cp = nltk.RegexpParser(grammar)

这个过程也可以稍微加快一些：

from itertools import ifilterfalse

...

keywords_without_stopwords = ifilterfalse(stopwords.__contains__, keywords)

return list(keywords_without_stopwords + set(tags))  # Can you cache `set(tags`)?

我还建议你看看 Flask-Cache，这样可以尽可能地对函数和视图进行记忆和缓存。

回答于 2025-04-17 由 Python大师

分享举报

如何防止Flask和NLTK的内存泄漏

1 个回答

撰写回答