在SQLAlchemy中从表到模型的反向映射

1 投票
2 回答
2829 浏览
提问于 2025-04-15 22:48

为了在我的基于SQLAlchemy的应用中提供活动日志,我有一个这样的模型:

class ActivityLog(Base):
    __tablename__ = 'activitylog'
    id = Column(Integer, primary_key=True)
    activity_by_id = Column(Integer, ForeignKey('users.id'), nullable=False)
    activity_by = relation(User, primaryjoin=activity_by_id == User.id)
    activity_at = Column(DateTime, default=datetime.utcnow, nullable=False)
    activity_type = Column(SmallInteger, nullable=False)

    target_table = Column(Unicode(20), nullable=False)
    target_id = Column(Integer, nullable=False)
    target_title = Column(Unicode(255), nullable=False)

这个日志包含多个表的记录,所以我不能使用外键关系。日志条目是这样生成的:

doc = Document(name=u'mydoc', title=u'My Test Document',
               created_by=user, edited_by=user)
session.add(doc)
session.flush() # See note below
log = ActivityLog(activity_by=user, activity_type=ACTIVITY_ADD,
                  target_table=Document.__table__.name, target_id=doc.id,
                  target_title=doc.title)
session.add(log)

这让我面临三个问题:

  1. 我必须在我的doc对象获得id之前先刷新会话。如果我使用了外键列和关系映射器,我可以简单地调用ActivityLog(target=doc),让SQLAlchemy自动处理这些事情。有没有办法避免手动刷新呢?

  2. 这个target_table参数太冗长了。我想我可以通过在ActivityLog中设置一个target属性来解决这个问题,这样它可以自动从给定的实例中获取表名和ID。

  3. 最大的问题是,我不确定如何从数据库中获取模型实例。给定一个ActivityLog实例log,调用self.session.query(log.target_table).get(log.target_id)并不能工作,因为query()需要一个模型作为参数。

一个解决办法似乎是使用多态性,让我的所有模型都继承自一个ActivityLog可以识别的基础模型。像这样:

class Entity(Base):
    __tablename__ = 'entities'
    id = Column(Integer, primary_key=True)
    title = Column(Unicode(255), nullable=False)
    edited_at = Column(DateTime, onupdate=datetime.utcnow, nullable=False)
    entity_type = Column(Unicode(20), nullable=False)
    __mapper_args__ = {'polymorphic_on': entity_type}

class Document(Entity):
    __tablename__ = 'documents'
    __mapper_args__ = {'polymorphic_identity': 'document'}
    body = Column(UnicodeText, nullable=False)

class ActivityLog(Base):
    __tablename__ = 'activitylog'
    id = Column(Integer, primary_key=True)
    ...
    target_id = Column(Integer, ForeignKey('entities.id'), nullable=False)
    target = relation(Entity)

如果我这样做,ActivityLog(...).target在引用一个文档时会给我一个Document实例,但我不确定为每个东西都有两个表是否值得。我要不要这样做呢?

2 个回答

1

一定要看看Ants链接的博客和示例。我觉得解释并不混乱,只是需要对这个话题有一些经验。

我可以给你提几点建议:

  • 外键(ForeignKeys):一般来说,我同意外键是个好东西,但在你的情况下,我不确定它是否真的重要。你似乎把这个ActivityLog当作一个独立的关注点来处理(类似于面向切面编程);但是如果使用外键的话,实际上会让你的业务对象对ActivityLog有所“了解”。另外,使用你现在的数据库结构,如果为了审计目的而使用外键的话,一旦你允许删除对象,外键的要求会导致删除所有与这个对象相关的ActivityLog条目。
  • 自动记录(Automatic logging):你现在每次创建、修改(或删除)对象时都在手动记录日志。使用SQLAlchemy(SA)的话,你可以实现一个SessionExtension,利用before_commit,这样就可以自动完成这个工作。

这样一来,你就完全可以避免写下面这样的代码:

log = ActivityLog(activity_by=user, activity_type=ACTIVITY_ADD,
                  target_table=Document.__table__.name, target_id=doc.id,
                  target_title=doc.title)
session.add(log)

编辑-1:已添加完整示例代码

  • 这段代码是基于http://techspot.zzzeek.org/?p=13的第一个非外键版本。
  • 选择不使用外键是因为在审计时,当主对象被删除时,不应该连带删除所有的审计日志条目。同时,这样也让可审计的对象对自己被审计的事实保持“无知”。
  • 实现中使用了SA的一对多关系。可能有些对象会被多次修改,这样就会产生很多审计日志条目。默认情况下,SA在向列表中添加新条目时会加载关系对象。假设在“正常”使用中,我们只想添加新的审计日志条目,我们使用lazy='noload'标志,这样主对象的关系就永远不会被加载。不过,从另一侧导航时会加载,并且也可以通过自定义查询从主对象加载,这在示例中也展示了,使用了activitylog_readonly只读属性。

代码(可以运行并进行一些测试):

from datetime import datetime

from sqlalchemy import create_engine, Column, Integer, SmallInteger, String, DateTime, ForeignKey, Table, UnicodeText, Unicode, and_
from sqlalchemy.orm import relationship, dynamic_loader, scoped_session, sessionmaker, class_mapper, backref
from sqlalchemy.orm.session import Session
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm.interfaces import SessionExtension

import logging
logging.basicConfig(level=logging.INFO)
_logger = logging.getLogger()

ACTIVITY_ADD = 1
ACTIVITY_MOD = 2
ACTIVITY_DEL = 3

class ActivityLogSessionExtension(SessionExtension):
    _logger = logging.getLogger('ActivityLogSessionExtension')

    def before_commit(self, session):
        self._logger.debug("before_commit: %s", session)
        for d in session.new:
            self._logger.info("before_commit >> add: %s", d)
            if hasattr(d, 'create_activitylog'):
                log = d.create_activitylog(ACTIVITY_ADD)
        for d in session.dirty:
            self._logger.info("before_commit >> mod: %s", d)
            if hasattr(d, 'create_activitylog'):
                log = d.create_activitylog(ACTIVITY_MOD)
        for d in session.deleted:
            self._logger.info("before_commit >> del: %s", d)
            if hasattr(d, 'create_activitylog'):
                log = d.create_activitylog(ACTIVITY_DEL)


# Configure test data SA
engine = create_engine('sqlite:///:memory:', echo=False)
session = scoped_session(sessionmaker(bind=engine, autoflush=False, extension=ActivityLogSessionExtension()))
Base = declarative_base()
Base.query = session.query_property()

class _BaseMixin(object):
    """ Just a helper mixin class to set properties on object creation.  
    Also provides a convenient default __repr__() function, but be aware that 
    also relationships are printed, which might result in loading relations.
    """
    def __init__(self, **kwargs):
        for k,v in kwargs.items():
            setattr(self, k, v)

    def __repr__(self):
        return "<%s(%s)>" % (self.__class__.__name__, 
            ', '.join('%s=%r' % (k, self.__dict__[k]) 
                for k in sorted(self.__dict__) if '_sa_' != k[:4] and '_backref_' != k[:9])
            )

class User(Base, _BaseMixin):
    __tablename__ = u'users'
    id = Column(Integer, primary_key=True)
    name = Column(String)

class Document(Base, _BaseMixin):
    __tablename__ = u'documents'
    id = Column(Integer, primary_key=True)
    title = Column(Unicode(255), nullable=False)
    body = Column(UnicodeText, nullable=False)

class Folder(Base, _BaseMixin):
    __tablename__ = u'folders'
    id = Column(Integer, primary_key=True)
    title = Column(Unicode(255), nullable=False)
    comment = Column(UnicodeText, nullable=False)

class ActivityLog(Base, _BaseMixin):
    __tablename__ = u'activitylog'
    id = Column(Integer, primary_key=True)

    activity_by_id = Column(Integer, ForeignKey('users.id'), nullable=False)
    activity_by = relationship(User) # @note: no need to specify the primaryjoin
    activity_at = Column(DateTime, default=datetime.utcnow, nullable=False)
    activity_type = Column(SmallInteger, nullable=False)

    target_table = Column(Unicode(20), nullable=False)
    target_id = Column(Integer, nullable=False)
    target_title = Column(Unicode(255), nullable=False)
    # backref relation for auditable
    target = property(lambda self: getattr(self, '_backref_%s' % self.target_table))

def _get_user():
    """ This method returns the User object for the current user.
    @todo: proper implementation required
    @hack: currently returns the 'user2'
    """
    return session.query(User).filter_by(name='user2').one()

# auditable support function
# based on first non-FK version from http://techspot.zzzeek.org/?p=13
def auditable(cls, name):
    def create_activitylog(self, activity_type):
        log = ActivityLog(activity_by=_get_user(),
                          activity_type=activity_type,
                          target_table=table.name, 
                          target_title=self.title,
                          )
        getattr(self, name).append(log)
        return log

    mapper = class_mapper(cls)
    table = mapper.local_table
    cls.create_activitylog = create_activitylog

    def _get_activitylog(self):
        return Session.object_session(self).query(ActivityLog).with_parent(self).all()
    setattr(cls, '%s_readonly' %(name,), property(_get_activitylog))

    # no constraints, therefore define constraints in an ad-hoc fashion.
    primaryjoin = and_(
            list(table.primary_key)[0] == ActivityLog.__table__.c.target_id,
            ActivityLog.__table__.c.target_table == table.name
    )
    foreign_keys = [ActivityLog.__table__.c.target_id]
    mapper.add_property(name, 
            # @note: because we use the relationship, by default all previous
            # ActivityLog items will be loaded for an object when new one is
            # added. To avoid this, use either dynamic_loader (http://www.sqlalchemy.org/docs/reference/orm/mapping.html#sqlalchemy.orm.dynamic_loader)
            # or lazy='noload'. This is the trade-off decision to be made.
            # Additional benefit of using lazy='noload' is that one can also
            # record DEL operations in the same way as ADD, MOD
            relationship(
                ActivityLog,
                lazy='noload',  # important for relationship
                primaryjoin=primaryjoin, 
                foreign_keys=foreign_keys,
                backref=backref('_backref_%s' % table.name, 
                    primaryjoin=list(table.primary_key)[0] == ActivityLog.__table__.c.target_id, 
                    foreign_keys=foreign_keys)
        )
    )

# this will define which classes support the ActivityLog interface
auditable(Document, 'activitylogs')
auditable(Folder, 'activitylogs')

# create db schema
Base.metadata.create_all(engine)


## >>>>> TESTS >>>>>>

# create some basic data first
u1 = User(name='user1')
u2 = User(name='user2')
session.add(u1)
session.add(u2)
session.commit()
session.expunge_all()
# --check--
assert not(_get_user() is None)


##############################
## ADD
##############################
_logger.info('-' * 80)
d1 = Document(title=u'Document-1', body=u'Doc1 some body skipped the body')
# when not using SessionExtension for any reason, this can be called manually
#d1.create_activitylog(ACTIVITY_ADD)
session.add(d1)
session.commit()

f1 = Folder(title=u'Folder-1', comment=u'This folder is empty')
# when not using SessionExtension for any reason, this can be called manually
#f1.create_activitylog(ACTIVITY_ADD)
session.add(f1)
session.commit()

# --check--
session.expunge_all()
logs = session.query(ActivityLog).all()
_logger.debug(logs)
assert len(logs) == 2
assert logs[0].activity_type == ACTIVITY_ADD
assert logs[0].target.title == u'Document-1'
assert logs[0].target.title == logs[0].target_title
assert logs[1].activity_type == ACTIVITY_ADD
assert logs[1].target.title == u'Folder-1'
assert logs[1].target.title == logs[1].target_title

##############################
## MOD(ify)
##############################
_logger.info('-' * 80)
session.expunge_all()
d1 = session.query(Document).filter_by(id=1).one()
assert d1.title == u'Document-1'
assert d1.body == u'Doc1 some body skipped the body'
assert d1.activitylogs == []
d1.title = u'Modified: Document-1'
d1.body = u'Modified: body'
# when not using SessionExtension (or it does not work, this can be called manually)
#d1.create_activitylog(ACTIVITY_MOD)
session.commit()
_logger.debug(d1.activitylogs_readonly)

# --check--
session.expunge_all()
logs = session.query(ActivityLog).all()
assert len(logs)==3
assert logs[2].activity_type == ACTIVITY_MOD
assert logs[2].target.title == u'Modified: Document-1'
assert logs[2].target.title == logs[2].target_title


##############################
## DEL(ete)
##############################
_logger.info('-' * 80)
session.expunge_all()
d1 = session.query(Document).filter_by(id=1).one()
# when not using SessionExtension for any reason, this can be called manually,
#d1.create_activitylog(ACTIVITY_DEL)
session.delete(d1)
session.commit()
session.expunge_all()

# --check--
session.expunge_all()
logs = session.query(ActivityLog).all()
assert len(logs)==4
assert logs[0].target is None
assert logs[2].target is None
assert logs[3].activity_type == ACTIVITY_DEL
assert logs[3].target is None

##############################
## print all activity logs
##############################
_logger.info('=' * 80)
logs = session.query(ActivityLog).all()
for log in logs:
    _ = log.target
    _logger.info("%s -> %s", log, log.target)

##############################
## navigate from main object
##############################
_logger.info('=' * 80)
session.expunge_all()
f1 = session.query(Folder).filter_by(id=1).one()
_logger.info(f1.activitylogs_readonly)
1

解决这个问题的一种方法是使用多态关联。这种方法可以解决你提到的三个问题,并且还能让数据库的外键约束正常工作。你可以查看SQLAlchemy源码中的多态关联示例。Mike Bayer还有一篇旧的博客文章,里面详细讨论了这个话题。

撰写回答