如何检测Django应用中的死锁(并解决它们)
我正在维护一个django项目,这个项目经常变得无响应。到目前为止,我的解决办法是不断监控这个应用,并在必要时重启apache。
无响应是什么意思呢?就是apache不再对任何请求做出回应。
环境信息:
- 操作系统:Debian Squeeze 64位
- 网页服务器:Apache 2.2.16 mod_wsgi(之前使用mod_python大约一年)
- Django版本:1.3.1(从1.0开始的每个主要版本)
- Python版本:2.6.6 + virtualenv(使用distribute,没有site-packages,之前有几种不同的配置在运行)
- 数据库后端:psycopg2 2.3.2
- 数据库:PostgreSQL 9.0(之前使用过8.3版本)
- 连接池:pgbouncer(如果不使用bouncer,问题依然存在)
- 反向代理:nginx 1.0.11
我该怎么做才能更接近错误的根源呢?(我不能提供源代码,不过可以提供一些代码片段)我追踪这个问题已经很久了,几乎不可能列出我尝试过的所有方法。我试图去掉我能想到的任何“魔法”设置。自从问题出现以来,应用的几个部分已经被重写。
对于信息的缺乏,我感到抱歉,但我很乐意提供(几乎)任何请求的信息,并承诺尽力让这篇帖子对其他面临类似问题的人尽可能有帮助。
2 个回答
1
你可能会遇到以下这个Django的bug [1](在1.4版本中还没有修复)
解决方法:手动将这个修复应用到你的Django源代码中,或者像下面这样使用一个线程安全的包装器来处理wsgi模块(我们在生产系统中使用这个方法)
from __future__ import with_statement
from django.core.handlers.wsgi import WSGIHandler as DjangoWSGIHandler
from threading import Lock
__copyright__ = "Jibe"
class WSGIHandler(DjangoWSGIHandler):
"""
This provides a threadsafe drop-in replacement of django's WSGIHandler.
Initialisation of django via a multithreaded wsgi handler is not safe.
It is vulnerable to a A-B B-A deadlock.
When two threads bootstrap django via different urls you have a change to hit
the following deadlock.
thread 1 thread 2
view A view B
import file foo import lock foo import file bar import lock bar
bootstrap django lock AppCache.write_lock
import file bar import lock bar <-- blocks
bootstrap django lock AppCache.write_lock <----- deadlock
workaround for an AB BA deadlock: wrap it in a lock C.
lock C lock C
lock A lock B
lock B lock A
release B release A
release A release A
release C release C
Thats exactly what this class does, but... only for the first few calls.
After that we remove the lock C. as the AppCache.write_lock is only held when django is booted.
If we would not remove the lock C after the first few calls, that would make the whole app single threaded again.
Usage:
in your wsgi file replace the following lines
import django.core.handlers.wsgi.WSGIHandler
application = django.core.handlers.wsgi.WSGIHandler
by
import threadsafe_wsgi
application = threadsafe_wsgi.WSGIHandler
FAQ:
Q: why would you want threading in the first place ?
A: to reduce memory. Big apps can consume hundeds of megabytes each. adding processes is then much more expensive than threads.
that memory is better spend caching, when threads are almost free.
Q: this deadlock, it looks far-fetched, is this real ?
A: yes we had this problem on production machines.
"""
__initLock = Lock() # lock C
__initialized = 0
def __call__(self, environ, start_response):
# the first calls (4) we squeeze everybody through lock C
# this basically serializes all threads
MIN_INIT_CALLS = 4
if self.__initialized < MIN_INIT_CALLS:
with self.__initLock:
ret = DjangoWSGIHandler.__call__(self, environ, start_response)
self.__initialized += 1
return ret
else:
# we are safely bootrapped, skip lock C
# now we are running multi-threaded again
return DjangoWSGIHandler.__call__(self, environ, start_response)
然后在你的wsgi.py
文件中使用以下代码
from threadsafe_wsgi.handlers import WSGIHandler
django_handler = WSGIHandler()
2
最终,你需要的是mod_wsgi 4.0中新增的功能。这些功能可以让你更好地控制当请求被阻塞时的自动重启。在遇到阻塞的情况时,mod_wsgi会尝试输出Python的堆栈跟踪信息,这样你就能看到每个Python请求线程在当时正在做什么,从而了解它们为什么会被阻塞。
建议你在mod_wsgi的邮件列表上提这个问题,如果需要的话,我可以更详细地解释这些新功能。我之前也在这里发过相关内容:
http://groups.google.com/group/modwsgi/msg/2a968d820e18e97d
目前,mod_wsgi 4.0的代码只能从源代码库获取。现在的主干版本被认为是稳定的。