如何使python数据类可散列而不使其不可变?

2024-03-29 02:34:18 发布

您现在位置:Python中文网/ 问答频道 /正文

假设我在python3中有一个数据类。我希望能够对这些对象进行散列和排序。我不希望这些是一成不变的

我只想在id上排序/哈希

我在文档中看到,我可以实现散列uu以及所有这些,但我希望让数据处理程序为我完成这项工作,因为它们旨在处理这一问题

from dataclasses import dataclass, field

@dataclass(eq=True, order=True)
class Category:
    id: str = field(compare=True)
    name: str = field(default="set this in post_init", compare=False)

a = sorted(list(set([ Category(id='x'), Category(id='y')])))

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'Category'

Tags: 数据对象in文档idtruefield排序
3条回答

TL;DR

frozen=Trueeq=True结合使用(这将使实例不可变)

长答案

docs开始:

__hash__() is used by built-in hash(), and when objects are added to hashed collections such as dictionaries and sets. Having a __hash__() implies that instances of the class are immutable. Mutability is a complicated property that depends on the programmer’s intent, the existence and behavior of __eq__(), and the values of the eq and frozen flags in the dataclass() decorator.

By default, dataclass() will not implicitly add a __hash__() method unless it is safe to do so. Neither will it add or change an existing explicitly defined __hash__() method. Setting the class attribute __hash__ = None has a specific meaning to Python, as described in the __hash__()documentation.

If __hash__() is not explicit defined, or if it is set to None, then dataclass() may add an implicit __hash__() method. Although not recommended, you can force dataclass() to create a __hash__() method with unsafe_hash=True. This might be the case if your class is logically immutable but can nonetheless be mutated. This is a specialized use case and should be considered carefully.

Here are the rules governing implicit creation of a __hash__() method. Note that you cannot both have an explicit __hash__() method in your dataclass and set unsafe_hash=True; this will result in a TypeError.

If eq and frozen are both true, by default dataclass() will generate a __hash__() method for you. If eq is true and frozen is false, __hash__() will be set to None, marking it unhashable (which it is, since it is mutable). If eq is false, __hash__() will be left untouched meaning the __hash__() method of the superclass will be used (if the superclass is object, this means it will fall back to id-based hashing).

the docs

Here are the rules governing implicit creation of a __hash__() method:

[...]

If eq and frozen are both true, by default dataclass() will generate a __hash__() method for you. If eq is true and frozen is false, __hash__() will be set to None, marking it unhashable (which it is, since it is mutable). If eq is false, __hash__() will be left untouched meaning the __hash__() method of the superclass will be used (if the superclass is object, this means it will fall back to id-based hashing).

由于您设置了eq=True并将frozen保留为默认值(False),因此您的数据类是不可损坏的

您有3种选择:

  • 设置frozen=True(除了eq=True),这将使您的类不可变和可散列
  • Setunsafe_hash=True,这将创建一个__hash__方法,但使您的类保持可变状态,因此,如果在dict或Set中存储时修改了类的实例,则可能会出现问题:

    cat = Category('foo', 'bar')
    categories = {cat}
    cat.id = 'baz'
    
    print(cat in categories)  # False
    
  • 手动实现__hash__方法

我想添加一个关于使用不安全散列的特别说明

通过设置compare=False或hash=False,可以排除通过哈希进行比较的字段。(默认情况下,哈希继承自比较)

如果您将节点存储在图形中,但希望在不中断散列的情况下标记已访问的节点(例如,如果它们位于未访问的节点集合中…),则这可能非常有用

from dataclasses import dataclass, field
@dataclass(unsafe_hash=True)
class node:
    x:int
    visit_count: int = field(default=10, compare=False)  # hash inherits compare setting. So valid.
    # visit_count: int = field(default=False, hash=False)   # also valid. Arguably easier to read, but can break some compare code.
    # visit_count: int = False   # if mutated, hashing breaks. (3* printed)

s = set()
n = node(1)
s.add(n)
if n in s: print("1* n in s")
n.visit_count = 11
if n in s:
    print("2* n still in s")
else:
    print("3* n is lost to the void because hashing broke.")

这花了我几个小时才弄明白。。。我找到的有用的进一步阅读资料是关于数据类的python文档。具体请参见字段文档和dataclass arg文档。 https://docs.python.org/3/library/dataclasses.html

相关问题 更多 >