如何确保属性值在Python中是唯一的?

2024-04-20 16:11:15 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在抓取一个网站,里面有人的名单。同一个人可以出现多次,而且多个人可以共享同一个姓名:

Tommy Atkins (id:312)
Tommy Atkins (id:183)
Tommy Atkins (id:312)

我想为每个人创建一个对象并丢弃重复的对象。你知道吗

我目前正在使用列表理解来循环所有类实例,看看key是否已经在使用。有没有更简单的方法?你知道吗

class Object:
    def __init__(self, key):
        if [object for object in objects if object.key == key]:
            raise Exception('key {} already exists'.format(key))
        else: self.key = key

objects = []
objects.append(Object(1))
objects.append(Object(1)) # Exception: key 1 already exists

Tags: 对象keyselfidifobjectsobject网站
2条回答

在类中定义^{}^{},根据key的值比较实例,并使用它计算哈希。使用set代替列表,因为它将以一种有效的方式自动过滤重复项:

class Object:
    def __init__(self, key):
        self.key = key

    def __eq__(self, other):
        if isinstance(other, type(self)):
            return self.key == other.key 
        return NotImplemented

    def __ne__(self, other):
        return not type(self).__eq__(self, other)

    def __hash__(self):
        return hash(self.key)


objects = set()
o1 = Object(1)
o2 = Object(1)
objects.add(o1)
objects.add(o2)

print (o1, o2)   # <__main__.Object object at 0x105996ba8> <__main__.Object object at 0x105996be0>
print (objects)  # {<__main__.Object object at 0x105996ba8>}

不要将实例永久地分配给变量,否则它将不会被垃圾收集(请注意,这仅适用于CPython):

objects = set()

for _ in range(5):
    ins = Object(1)
    print(id(ins))
    objects.add(ins)

输出:

4495640448 # First instance and this is now stored in the set
           # hence it is not going to be garbage collected. 
4495640840 # Python is now using new memory space.
4495640896 # Right now 4495640840 is still owned by the 
           # previous instance, hence use new memory address
           # But after this assignment the instance at 4495640840 
           # has no more references, i.e ins now points to 4495640896
4495640840 # Re-use 4495640840
4495640896 # Repeat...

ids的全局存储很好,但是最好利用set而不是list,因为检查i in {}是O(1),而i in []是O(N)

相关问题 更多 >