python set.contains 的意外行为

8 投票

3 回答

1423 浏览

提问于 2025-04-17 03:04

借用一下__contains__的文档内容

print set.__contains__.__doc__
x.__contains__(y) <==> y in x.

对于像整数（int）、基本字符串（basestring）这样的基本对象，这个方法似乎运行得很好。但是对于那些自己定义的对象，如果这些对象定义了__ne__和__eq__这两个方法，我就遇到了意想不到的情况。下面是一个示例代码：

class CA(object):
  def __init__(self,name):
    self.name = name

  def __eq__(self,other):
    if self.name == other.name:
      return True
    return False

  def __ne__(self,other):
    return not self.__eq__(other)

obj1 = CA('hello')
obj2 = CA('hello')

theList = [obj1,]
theSet = set(theList)

# Test 1: list
print (obj2 in theList)  # return True

# Test 2: set weird
print (obj2 in theSet)  # return False  unexpected

# Test 3: iterating over the set
found = False
for x in theSet:
  if x == obj2:
    found = True

print found   # return True

# Test 4: Typcasting the set to a list
print (obj2 in list(theSet))  # return True

那么这算是一个错误，还是一个特性呢？

集合操作方法重载自定义对象基本数据类型意外行为

3 个回答

一个 set 会对它的元素进行哈希处理，这样可以快速查找。你需要重写 __hash__ 方法，这样才能找到某个元素：

class CA(object):
  def __hash__(self):
    return hash(self.name)

而列表则不使用哈希，它会像你在 for 循环中那样逐个比较每个元素。

回答于 2025-04-17 由 Python大师

分享举报

这是因为 CA 没有实现 __hash__ 这个方法。

一个合理的实现方式应该是：

def __hash__(self):
    return hash(self.name)

回答于 2025-04-17 由 Python大师

分享举报

对于 set（集合）和 dict（字典），你需要定义一个叫做 __hash__ 的方法。任何两个相等的对象，它们的哈希值也应该是相同的，这样在 set 和 dict 中才能得到一致和预期的行为。

我建议使用一个 _key 方法，然后在需要比较某个项目的部分时直接引用它，就像你在 __ne__ 方法中调用 __eq__ 一样，而不是重新实现它：

class CA(object):
  def __init__(self,name):
    self.name = name

  def _key(self):
    return type(self), self.name

  def __hash__(self):
    return hash(self._key())

  def __eq__(self,other):
    if self._key() == other._key():
      return True
    return False

  def __ne__(self,other):
    return not self.__eq__(other)

回答于 2025-04-17 由 Python大师

分享举报

python set.__contains__ 的意外行为

3 个回答

撰写回答

python set.contains 的意外行为