如何检测Django应用中的死锁(并解决它们)

1 投票
2 回答
2439 浏览
提问于 2025-04-17 09:56

我正在维护一个django项目,这个项目经常变得无响应。到目前为止,我的解决办法是不断监控这个应用,并在必要时重启apache。

无响应是什么意思呢?就是apache不再对任何请求做出回应。

环境信息:

  • 操作系统:Debian Squeeze 64位
  • 网页服务器:Apache 2.2.16 mod_wsgi(之前使用mod_python大约一年)
  • Django版本:1.3.1(从1.0开始的每个主要版本)
  • Python版本:2.6.6 + virtualenv(使用distribute,没有site-packages,之前有几种不同的配置在运行)
  • 数据库后端:psycopg2 2.3.2
  • 数据库:PostgreSQL 9.0(之前使用过8.3版本)
  • 连接池:pgbouncer(如果不使用bouncer,问题依然存在)
  • 反向代理:nginx 1.0.11

我该怎么做才能更接近错误的根源呢?(我不能提供源代码,不过可以提供一些代码片段)我追踪这个问题已经很久了,几乎不可能列出我尝试过的所有方法。我试图去掉我能想到的任何“魔法”设置。自从问题出现以来,应用的几个部分已经被重写。

对于信息的缺乏,我感到抱歉,但我很乐意提供(几乎)任何请求的信息,并承诺尽力让这篇帖子对其他面临类似问题的人尽可能有帮助。

2 个回答

1

你可能会遇到以下这个Django的bug [1](在1.4版本中还没有修复)

解决方法:手动将这个修复应用到你的Django源代码中,或者像下面这样使用一个线程安全的包装器来处理wsgi模块(我们在生产系统中使用这个方法)

from __future__ import with_statement
from  django.core.handlers.wsgi import WSGIHandler as DjangoWSGIHandler

from threading import Lock

__copyright__ = "Jibe"

class WSGIHandler(DjangoWSGIHandler):
    """
    This provides a threadsafe drop-in replacement of django's WSGIHandler.

    Initialisation of django via a multithreaded wsgi handler is not safe.
    It is vulnerable to a A-B B-A deadlock.

When two threads bootstrap django via different urls you have a change to hit 
the following deadlock.

  thread 1                                               thread  2
    view A                                                  view B
     import file foo            import lock foo               import file bar  import lock bar
           bootstrap django     lock AppCache.write_lock
                import file bar import lock bar  <-- blocks
                                                                 bootstrap django    lock AppCache.write_lock  <----- deadlock

workaround for an AB BA deadlock:  wrap it in a lock C.

        lock C                      lock C
            lock A                      lock B
            lock B                      lock A
            release B                   release A
            release A                   release A
        release C                   release C          

    Thats exactly what this class does,  but... only for the first few calls.  
    After that we remove the lock C.  as the AppCache.write_lock is only held when django is booted. 

    If we would not remove the lock C after the first few calls, that would make the whole app single threaded again. 

    Usage:    
        in your wsgi file replace   the following lines 
                import django.core.handlers.wsgi.WSGIHandler  
                application = django.core.handlers.wsgi.WSGIHandler 
        by 
                import threadsafe_wsgi 
                application = threadsafe_wsgi.WSGIHandler 


    FAQ: 
        Q: why would you want threading in the first place ?                 
        A: to reduce memory. Big apps can consume hundeds of megabytes each.  adding processes is then much more expensive than threads. 
           that memory is better spend caching, when threads are almost free. 

        Q: this deadlock, it looks far-fetched, is this real ? 
        A: yes we had this problem on production machines. 
    """ 
    __initLock = Lock()  # lock C 
    __initialized = 0 

    def __call__(self, environ, start_response): 
        # the first calls (4) we squeeze everybody through lock C 
        # this basically serializes all threads 
        MIN_INIT_CALLS = 4 
        if self.__initialized < MIN_INIT_CALLS: 
            with self.__initLock: 
                ret = DjangoWSGIHandler.__call__(self, environ, start_response) 
                self.__initialized += 1 
                return ret 
        else: 
            # we are safely bootrapped, skip lock C 
            # now we are running multi-threaded again 
            return  DjangoWSGIHandler.__call__(self, environ, start_response)

然后在你的wsgi.py文件中使用以下代码

from threadsafe_wsgi.handlers import WSGIHandler
django_handler = WSGIHandler()

[1] https://code.djangoproject.com/ticket/18251

2

最终,你需要的是mod_wsgi 4.0中新增的功能。这些功能可以让你更好地控制当请求被阻塞时的自动重启。在遇到阻塞的情况时,mod_wsgi会尝试输出Python的堆栈跟踪信息,这样你就能看到每个Python请求线程在当时正在做什么,从而了解它们为什么会被阻塞。

建议你在mod_wsgi的邮件列表上提这个问题,如果需要的话,我可以更详细地解释这些新功能。我之前也在这里发过相关内容:

http://groups.google.com/group/modwsgi/msg/2a968d820e18e97d

目前,mod_wsgi 4.0的代码只能从源代码库获取。现在的主干版本被认为是稳定的。

撰写回答