python：字典困境：如何根据属性正确索引对象

4 投票

2 回答

1991 浏览

提问于 2025-04-15 19:32

首先，来看一个例子：

假设我们有一堆“人”对象，每个对象都有不同的属性，比如名字、社保号、电话、电子邮件地址、信用卡号等等。

现在想象一下这样一个简单的网站：

使用用户的电子邮件地址作为唯一的登录名

允许用户编辑他们的属性（包括电子邮件地址）

如果这个网站有很多用户，那么把“人”对象存储在一个以电子邮件地址为索引的字典里是很有意义的，这样用户登录时可以快速找到对应的“人”对象。

但是，当一个用户的电子邮件地址被编辑时，这个“人”对象在字典中的键也需要更改。这就有点麻烦了。

我在寻找一些建议，来解决一个通用的问题：

假设有一堆实体，它们有一个共同的特征。这个特征既用于快速访问这些实体，也在每个实体的功能中使用。那这个特征应该放在哪里呢：

放在每个实体内部（但这样不利于快速访问）
只放在索引中（但这样不利于每个实体的功能）
同时放在每个实体和索引中（这样会有重复的数据/引用）
放在其他地方/以不同的方式处理

这个问题可以进一步扩展，比如如果我们想用多个索引来索引数据（社保号、信用卡号等等）。最终我们可能会得到一堆SQL表。

我希望找到一些具有以下特性的解决方案（如果你能想到更多特性也可以）：

# create an index on the attribute of a class
magical_index = magical_index_factory(class, class.attribute)
# create an object
obj = class() 
# set the object's attribute
obj.attribute= value
# retrieve object from using attribute as index
magical_index[value] 
# change object attribute to new value
obj.attribute= new_value 
# automagically object can be retrieved using new value of attribute
magical_index[new_value]
# become less materialistic: get rid of the objects in your life
del obj
# object is really gone
magical_index[new_value]
KeyError: new_value

我希望对象、索引之间能够很好地协同工作，互不干扰。

请建议合适的设计模式。

注意：上面的例子只是一个例子，用来说明这个通用问题。所以请提供通用的解决方案（当然，你可以在解释你的通用解决方案时继续使用这个例子）。

数据结构设计模式数据库设计属性管理数据一致性实体关系对象存储索引设计

2 个回答

好吧，另一种方法可能是实现以下内容：

Attr 是一个“值”的抽象。我们需要这个，因为在 Python 中没有“赋值重载”的功能（使用简单的获取/设置方式是最干净的替代方案）。Attr 还充当一个“可观察对象”。
AttrSet 是一个“观察者”，它监控 Attr 的值变化，同时有效地充当一个 Attr 到任意对象（在我们的例子中是 person）的字典。
create_with_attrs 是一个工厂，生成看起来像命名元组的东西，通过提供的 Attr 转发属性访问，这样 person.name = "Ivan" 实际上会变成 person.name_attr.set("Ivan")，并使得观察这个 person 的 AttrSet 适当地调整它们的内部结构。

以下是经过测试的代码：

from collections import defaultdict

class Attribute(object):
    def __init__(self, value):
        super(Attribute, self).__init__()
        self._value = value
        self._notified_set = set()
    def set(self, value):
        old = self._value
        self._value = value
        for n_ch in self._notified_set:
            n_ch(old_value=old, new_value=value)
    def get(self):
        return self._value
    def add_notify_changed(self, notify_changed):
        self._notified_set.add(notify_changed)
    def remove_notify_changed(self, notify_changed):
        self._notified_set.remove(notify_changed)

class AttrSet(object):
    def __init__(self):
        super(AttrSet, self).__init__()
        self._attr_value_to_obj_set = defaultdict(set)
        self._obj_to_attr = {}
        self._attr_to_notify_changed = {}
    def add(self, attr, obj):
        self._obj_to_attr[obj] = attr
        self._add(attr.get(), obj)
        notify_changed = (lambda old_value, new_value:
                          self._notify_changed(obj, old_value, new_value))
        attr.add_notify_changed(notify_changed)
        self._attr_to_notify_changed[attr] = notify_changed
    def get(self, *attr_value_lst):
        attr_value_lst = attr_value_lst or self._attr_value_to_obj_set.keys()
        result = set()
        for attr_value in attr_value_lst:
            result.update(self._attr_value_to_obj_set[attr_value])
        return result
    def remove(self, obj):
        attr = self._obj_to_attr.pop(obj)
        self._remove(attr.get(), obj)
        notify_changed = self._attr_to_notify_changed.pop(attr)
        attr.remove_notify_changed(notify_changed)
    def __iter__(self):
        return iter(self.get())
    def _add(self, attr_value, obj):
        self._attr_value_to_obj_set[attr_value].add(obj)
    def _remove(self, attr_value, obj):
        obj_set = self._attr_value_to_obj_set[attr_value]
        obj_set.remove(obj)
        if not obj_set:
            self._attr_value_to_obj_set.pop(attr_value)
    def _notify_changed(self, obj, old_value, new_value):
        self._remove(old_value, obj)
        self._add(new_value, obj)

def create_with_attrs(**attr_name_to_attr):
    class Result(object):
        def __getattr__(self, attr_name):
            if attr_name in attr_name_to_attr.keys():
                return attr_name_to_attr[attr_name].get()
            else:
                raise AttributeError(attr_name)
        def __setattr__(self, attr_name, attr_value):
            if attr_name in attr_name_to_attr.keys():
                attr_name_to_attr[attr_name].set(attr_value)
            else:
                raise AttributeError(attr_name)
        def __str__(self):
            result = ""
            for attr_name in attr_name_to_attr:
                result += (attr_name + ": "
                           + str(attr_name_to_attr[attr_name].get())
                           + ", ")
            return result
    return Result()

数据准备好后，

name_and_email_lst = [("John","email1@dot.com"),
                      ("John","email2@dot.com"),
                      ("Jack","email3@dot.com"),
                      ("Hack","email4@dot.com"),
                      ]

email = AttrSet()
name = AttrSet()

for name_str, email_str in name_and_email_lst:
    email_attr = Attribute(email_str)
    name_attr = Attribute(name_str)
    person = create_with_attrs(email=email_attr, name=name_attr)
    email.add(email_attr, person)
    name.add(name_attr, person)

def print_set(person_set):
    for person in person_set: print person
    print

以下伪 SQL 代码片段序列给出：

SELECT id FROM email

>>> print_set(email.get())
email: email3@dot.com, name: Jack,
email: email4@dot.com, name: Hack,
email: email2@dot.com, name: John,
email: email1@dot.com, name: John,

SELECT id FROM email WHERE email="email1@dot.com"

>>> print_set(email.get("email1@dot.com"))
email: email1@dot.com, name: John,

SELECT id FROM email WHERE email="email1@dot.com" OR email="email2@dot.com"

>>> print_set(email.get("email1@dot.com", "email2@dot.com"))
email: email1@dot.com, name: John,
email: email2@dot.com, name: John,

SELECT id FROM name WHERE name="John"

>>> print_set(name.get("John"))
email: email1@dot.com, name: John,
email: email2@dot.com, name: John,

SELECT id FROM name, email WHERE name="John" AND email="email1@dot.com"

>>> print_set(name.get("John").intersection(email.get("email1@dot.com")))
email: email1@dot.com, name: John,

UPDATE email, name SET email="jon@dot.com", name="Jon"

WHERE id IN

SELECT id FROM email WHERE email="email1@dot.com"

>>> person = email.get("email1@dot.com").pop()
>>> person.name = "Jon"; person.email = "jon@dot.com"
>>> print_set(email.get())
email: email3@dot.com, name: Jack,
email: email4@dot.com, name: Hack,
email: email2@dot.com, name: John,
email: jon@dot.com, name: Jon,

DELETE FROM email, name WHERE id=%s

SELECT id FROM email

>>> name.remove(person)
>>> email.remove(person)
>>> print_set(email.get())
email: email3@dot.com, name: Jack,
email: email4@dot.com, name: Hack,
email: email2@dot.com, name: John,

回答于 2025-04-15 由 Python大师

分享举报

考虑一下这个问题。

class Person( object ):
    def __init__( self, name, addr, email, etc. ):
        self.observer= []
        ... etc. ...
    @property
    def name( self ): return self._name
    @name.setter
    def name( self, value ): 
        self._name= value
        for observer in self.observedBy: observer.update( self )
    ... etc. ...

这个 observer 属性实现了一个 可观察对象，它会通知它的 观察者 有更新。这是需要被通知变化的观察者列表。

每个属性都被包裹在属性中。使用 描述符 可能更好，因为它可以避免重复发送观察者通知。

class PersonCollection( set ):
    def __init__( self, *args, **kw ):
        self.byName= collections.defaultdict(list)
        self.byEmail= collections.defaultdict(list)
        super( PersonCollection, self ).__init__( *args, **kw )
    def add( self, person ):
        super( PersonCollection, self ).append( person )
        person.observer.append( self )
        self.byName[person.name].append( person )
        self.byEmail[person.email].append( person )
    def update( self, person ):
        """This person changed.  Find them in old indexes and fix them."""
        changed = [(k,v) for k,v in self.byName.items() if id(person) == id(v) ]
        for k, v in changed:
            self.byName.pop( k )
        self.byName[person.name].append( person )
        changed = [(k,v) for k,v in self.byEmail.items() if id(person) == id(v) ]
        for k, v in changed:
            self.byEmail.pop( k )
        self.byEmail[person.email].append( person)

    ... etc. ... for all methods of a collections.Set.

想了解更多需要实现的内容，可以查看 collections.ABC。

http://docs.python.org/library/collections.html#abcs-abstract-base-classes

如果你想要“通用”的索引，那么你的集合可以用属性的名字来参数化，你可以使用 getattr 从底层对象中获取这些命名的属性。

class GenericIndexedCollection( set ):
    attributes_to_index = [ ] # List of attribute names
    def __init__( self, *args, **kw ):
        self.indexes = dict( (n, {}) for n in self.attributes_to_index ]
        super( PersonCollection, self ).__init__( *args, **kw )
    def add( self, person ):
        super( PersonCollection, self ).append( person )
        for i in self.indexes:
            self.indexes[i].append( getattr( person, i )

注意：为了正确模拟数据库，应该使用集合而不是列表。数据库表（理论上）是集合。实际上，它们是无序的，索引可以让数据库拒绝重复项。有些关系型数据库管理系统（RDBMS）不会拒绝重复行，因为——没有索引——检查重复项的成本太高。

回答于 2025-04-15 由 Python大师

分享举报

python：字典困境：如何根据属性正确索引对象

2 个回答

撰写回答