SQLAlchemy连接错误

Question

我遇到了一些奇怪的错误，似乎是因为Sqlalchemy使用的连接引起的，但我不太能确定具体原因。我希望有人能给我一些线索，看看这里发生了什么。

我们正在使用Pyramid（版本1.5b1）和Sqlalchemy（版本0.9.6）来处理所有的数据库连接。有时候我们会遇到与数据库连接或会话相关的错误，大多数情况下是cursor already closed或This Connection is closed的错误，但我们也会遇到其他相关的异常：

(OperationalError) connection pointer is NULL
(InterfaceError) cursor already closed
Parent instance <...> is not bound to a Session, and no contextual session is established; lazy load operation of attribute '...' cannot proceed

A conflicting state is already present in the identity map for key (<class '...'>, (1001L,))
This Connection is closed (original cause: ResourceClosedError: This Connection is closed)
(InterfaceError) cursor already closed
Parent instance <...> is not bound to a Session; lazy load operation of attribute '...' cannot proceed
Parent instance <...> is not bound to a Session, and no contextual session is established; lazy load operation of attribute '...' cannot proceed
'NoneType' object has no attribute 'twophase'
(OperationalError) connection pointer is NULL
This session is in 'prepared' state; no further

没有什么简单的方法可以重现这些错误，只有通过多次刷新才能在某个时刻出现。所以我写了一个脚本，使用multi-mechanize同时请求不同的URL，看看在哪里和何时会发生错误。

似乎触发错误的URL并不重要，错误发生在有多个请求同时进行，并且这些请求持续的时间较长（中间还有其他请求被处理）。这似乎表明存在某种线程问题；可能是会话或连接在不同的线程之间共享。

在网上搜索这些问题时，我发现了很多讨论，大多数都建议使用作用域会话，但我们实际上已经在使用它们：

db_session = scoped_session(sessionmaker(extension=ZopeTransactionExtension(), autocommit=False, autoflush=False))
db_meta = MetaData()

我们有一个BaseModel来处理所有的ORM对象：

BaseModel = declarative_base(cls=BaseModelObj, metaclass=BaseMeta, metadata=db_meta)
我们使用pyramid_tm中间件来处理请求期间的事务
我们将db_session.remove()挂钩到pyramid的NewResponse事件（这个事件在所有处理完成后触发）。我还尝试将它放在pyramid_tm之后的一个单独中间件中，或者根本不放，但这些似乎都没有效果，所以我觉得将它放在响应事件中是最干净的做法。
我们在Pyramid项目的主入口创建引擎，使用NullPool，并将连接池交给pgbouncer。我们还在这里配置了会话和BaseModel的绑定：

engine = engine_from_config(config.registry.settings, 'sqlalchemy.', poolclass=NullPool) db_session.configure(bind=engine, query_cls=FilterQuery) BaseModel.metadata.bind = engine config.add_subscriber(cleanup_db_session, NewResponse) return config.make_wsgi_app()
在我们的应用中，我们通过以下方式访问所有数据库操作：

from project.db import db_session ... db_session.query(MyModel).filter(...) db_session.execute(...)
我们使用psycopg2==2.5.2来处理与Postgres的连接，中间通过pgbouncer。
我确保没有地方保存db_session或连接的引用（这可能导致其他线程重用它们）。

我还尝试了使用不同的Web服务器进行压力测试，使用waitress和cogen时很容易就出现了错误，而使用wsgiref时显然没有错误（因为它是单线程的）。使用uwsgi和gunicorn（4个工作进程，gevent）时没有遇到任何错误。

考虑到使用的Web服务器的不同，我觉得这可能与某些Web服务器使用线程处理请求，而某些则使用新进程（可能是分叉问题）有关？更复杂的是，随着时间的推移，我进行了一些新的测试，问题在waitress中消失了，但现在在gunicorn中出现了（使用gevent时）！我完全不知道该如何调试这个问题...

最后，为了测试连接发生了什么，我在光标执行开始时给连接附加了一个属性，并尝试在执行结束时读取这个属性：

@event.listens_for(Engine, "before_cursor_execute")
def _before_cursor_execute(conn, cursor, stmt, params, context, execmany):
  conn.pdtb_start_timer = time.time()

@event.listens_for(Engine, "after_cursor_execute")
def _after_cursor_execute(conn, cursor, stmt, params, context, execmany):
  print conn.pdtb_start_timer

令人惊讶的是，这有时会引发一个异常：'Connection'对象没有属性'pdtb_start_timer'

这让我觉得很奇怪。我找到了一些类似的讨论： https://groups.google.com/d/msg/sqlalchemy/GQZSjHAGkWM/rDflJvuyWnEJ 我尝试在引擎中添加strategy='threadlocal'，据我了解，这应该强制每个线程使用一个连接。但这对我看到的错误没有任何影响……（除了因为我需要两个不同的会话/连接进行一些测试而导致一些单元测试失败，这强制将一个连接关联起来）

有没有人知道这里可能发生了什么，或者有什么更好的建议来解决这个问题？

提前谢谢！

Matthijs Blaas

orm sqlalchemy database connection session management thread safety connection pooling pyramid framework pgbouncer

SQLAlchemy连接错误

1 个回答

撰写回答