在googleappengine（python）上从Google云存储读取文件时内存泄漏

class ReadGSFile(webapp2.RequestHandler): def get(self): import cloudstorage as gcs self.response.headers['Content-Type'] = "file type" read_path = "path/to/file" with gcs.open(read_path, 'r') as fp: buf = fp.read(1000000) while buf: self.response.out.write(buf) buf = fp.read(1000000) fp.close()

3条回答

网友

1楼 · 编辑于 2024-05-14 16:50:08

我也经历过类似的问题。在我的代码中，我按顺序下载了相当多的1-10MB文件，对所有文件进行一些处理，然后将结果发布到云端。在

我亲眼目睹了严重的内存泄漏无法连续处理超过50-100次的下载。在

由于不愿意将下载代码重写到Blobstore，我尝试了一个最后的实验，每次下载后手动调用垃圾回收：

import gc
gc.collect()

我现在运行代码几分钟，没有任何“超过软私有内存限制”，并且实例的内存占用似乎以慢得多的速度增长。在

显然，这可能只是一个好运气，占用空间仍在逐渐增加，但有一些下降，并且实例已经服务了2000个请求。在

网友

2楼 · 编辑于 2024-05-14 16:50:08

根据以上用户voscausa的评论，我改变了文件下载的方案，即使用Blobstore来服务文件下载。现在解决了内存泄漏问题。在

参考号：https://cloud.google.com/appengine/docs/python/blobstore/#Python_Using_the_Blobstore_API_with_Google_Cloud_Storage

from google.appengine.ext import blobstore
from google.appengine.ext.webapp import blobstore_handlers

class GCSServingHandler(blobstore_handlers.BlobstoreDownloadHandler):
  def get(self):
    read_path = "/path/to/gcs file/"  # The leading chars should not be "/gs/"
    blob_key  = blobstore.create_gs_key("/gs/" + read_path)

    f_name = "file name"
    f_type = "file type" # Such as 'text/plain'

    self.response.headers['Content-Type'] = f_type
    self.response.headers['Content-Disposition'] = "attachment; filename=\"%s\";"%f_name
    self.response.headers['Content-Disposition'] += " filename*=utf-8''" + urllib2.quote(f_name.encode("utf8"))

    self.send_blob(blob_key)

网友

3楼 · 编辑于 2024-05-14 16:50:08

尝试清除上下文缓存中的内容。在

from google.appengine.ext import ndb

context = ndb.get_context()
context.clear_cache()

See documentation here

With executing long-running queries in background tasks, it's possible for the in-context cache to consume large amounts of memory. This is because the cache keeps a copy of every entity that is retrieved or stored in the current context. To avoid memory exceptions in long-running tasks, you can disable the cache or set a policy that excludes whichever entities are consuming the most memory.

您也可以尝试清除webapp2响应对象缓冲区。在循环之前插入这行代码

^{pr2}$

The response buffers all output in memory, then sends the final output when the handler exits. webapp2 does not support streaming data to the client. The clear() method erases the contents of the output buffer, leaving it empty.

Check this link

相关问题更多 >

编程相关推荐

热门问题

热门文章