Python/Django数据库轮询内存问题

from requisitions.models import Requisition # our Django model from multiprocessing import Queue while True: # Wait for "N"ew requisitions, then pop them into the queue. for pr in Requisition.objects.all().filter(status=Requisition.STATUS_NEW): pr.set_status(pr.STATUS_WORKING) pr.save() queue.put(pr.id) time.sleep(settings.DAEMON_POLL_WAIT)

from django.core.cache import cache while True: time.sleep(settings.DAEMON_POLL_WAIT) if cache.get('new_requisitions'): # Possible race condition cache.clear() process_new_requisitions(queue) def process_new_requisitions(queue): for pr in Requisition.objects.all().filter(status=Requisition.STATUS_NEW): pr.set_status(pr.STATUS_WORKING) pr.save() queue.put(pr.id)

3条回答

网友

1楼 · 编辑于 2024-04-25 20:48:53

我有很多数据处理工作要做，所以，我解决这个问题的方法是使用多处理，并使用池来抵消正在发生的任何内存膨胀。

为了保持简单，我只是定义了一些“全局”（顶级，不管Python中的术语是什么）函数，而不是试图使事情变得可pickle。

这里是抽象形式：

import multiprocessing as mp

WORKERS = 16 # I had 7 cores, allocated 16 because processing was I/O bound

# this is a global function
def worker(params):
  # do stuff
  return something_for_the_callback_to_analyze

# this is a global function
def worker_callback(worker_return_value):
  # report stuff, or pass

# My multiprocess_launch was inside of a class
def multiprocess_launcher(params):
  # somehow define a collection
  while True:
    if len(collection) == 0:
      break
    # Take a slice
    pool_sub_batch = []
    for _ in range(WORKERS):
      if collection: # as long as there's still something in the collection
        pool_sub_batch.append( collection.pop() )
    # Start a pool, limited to the slice
    pool_size = WORKERS
    if len(pool_sub_batch) < WORKERS:
      pool_size = len(pool_sub_batch)
    pool = mp.Pool(processes=pool_size)
    for sub_batch in pool_sub_batch:
      pool.apply_async(worker, args = (sub_batch), callback = worker_callback)
    pool.close()
    pool.join()
    # Loop, more slices

网友

2楼 · 编辑于 2024-04-25 20:48:53

您需要定期重置Django为调试目的保留的查询列表。通常在每次请求后都会清除它，但由于您的应用程序不是基于请求的，因此您需要手动执行此操作：

from django import db

db.reset_queries()

另见：

米科的"Debugging Django memory leak with TrackRefs and Guppy" 奥塔玛：
Django keeps track of all queries for debugging purposes (connection.queries). This list is reseted at the end of HTTP request. But in standalone mode, there are no requests. So you need to manually reset to queries list after each working cycle
"Why is Django leaking memory?" in Django FAQ-两者都能说关于将始终重要的DEBUG设置为False，以及关于使用db.reset_queries()清除查询列表，在像你这样的应用中很重要。

网友

3楼 · 编辑于 2024-04-25 20:48:53

守护进程的settings.py文件是否有DEBUG = True？如果是这样的话，Django会在内存中保存到目前为止运行的所有SQL的记录，这可能会导致内存泄漏。

相关问题更多 >

编程相关推荐

热门问题

热门文章