使用Trie的Python拼写检查器

3 投票
3 回答
2777 浏览
提问于 2025-04-17 04:36

我正在尝试用一种叫做“前缀树”的数据结构来实现一个拼写检查器。目前我有一个Node的基本结构:

class Node:
    def __init__(self):
        self.next = {}
        self.word_marker = False

    def add_item(self, string):
        #if the length of the string is 0 then return. When the end of the word 
        #comes set the word_marker to true
        if len(string) == 0:
            self.word_marker = True
            return
        #set the key to the first letter of the string and reformat the string to reflect the first letter taken out
        #ultimately going to store the letters for each node as the key
        key = string[0]
        string = string[1:]

        #if there is a key in the dictionary, then recursively call add_item with new string
        if key in self.next:
            self.next[key].add_item(string)

        else:
            node = Node()
            self.next[key] = node
            node.add_item(string)

接下来,我想写一个函数来搜索一个string,然后返回一个建议的拼写。这个函数的定义是def correct(self, string)。我该如何遍历这个前缀树来实现搜索呢?假设我已经通过定义一个根节点root,然后用add_item把单词列表中的每个单词添加到前缀树里。

3 个回答

0

虽然这不是直接的答案,但你可以考虑从一个更完善的Trie(字典树)实现开始,比如这个:

class Trie:

    def __init__(self):
        self.__final = False
        self.__nodes = {}

    def __repr__(self):
        return 'Trie<len={}, final={}>'.format(len(self), self.__final)

    def __getstate__(self):
        return self.__final, self.__nodes

    def __setstate__(self, state):
        self.__final, self.__nodes = state

    def __len__(self):
        return len(self.__nodes)

    def __bool__(self):
        return self.__final

    def __contains__(self, array):
        try:
            return self[array]
        except KeyError:
            return False

    def __iter__(self):
        yield self
        for node in self.__nodes.values():
            yield from node

    def __getitem__(self, array):
        return self.__get(array, False)

    def create(self, array):
        self.__get(array, True).__final = True

    def read(self):
        yield from self.__read([])

    def update(self, array):
        self[array].__final = True

    def delete(self, array):
        self[array].__final = False

    def prune(self):
        for key, value in tuple(self.__nodes.items()):
            if not value.prune():
                del self.__nodes[key]
        if not len(self):
            self.delete([])
        return self

    def __get(self, array, create):
        if array:
            head, *tail = array
            if create and head not in self.__nodes:
                self.__nodes[head] = Trie()
            return self.__nodes[head].__get(tail, create)
        return self

    def __read(self, name):
        if self.__final:
            yield name
        for key, value in self.__nodes.items():
            yield from value.__read(name + [key])
2

这里有一个和你问题相关的回答:https://stackoverflow.com/a/11016430/793956

另外,你也可以考虑使用一些库,比如 https://github.com/kmike/marisa-trie#readme 或者 https://github.com/kmike/datrie#readme

4

如果你还没看过,可以去看看诺维格的《如何写一个拼写纠正器》。这个文章会教你怎么做拼写检查的工具。

撰写回答