使用Trie的Python拼写检查器
我正在尝试用一种叫做“前缀树”的数据结构来实现一个拼写检查器。目前我有一个Node
的基本结构:
class Node:
def __init__(self):
self.next = {}
self.word_marker = False
def add_item(self, string):
#if the length of the string is 0 then return. When the end of the word
#comes set the word_marker to true
if len(string) == 0:
self.word_marker = True
return
#set the key to the first letter of the string and reformat the string to reflect the first letter taken out
#ultimately going to store the letters for each node as the key
key = string[0]
string = string[1:]
#if there is a key in the dictionary, then recursively call add_item with new string
if key in self.next:
self.next[key].add_item(string)
else:
node = Node()
self.next[key] = node
node.add_item(string)
接下来,我想写一个函数来搜索一个string
,然后返回一个建议的拼写。这个函数的定义是def correct(self, string)
。我该如何遍历这个前缀树来实现搜索呢?假设我已经通过定义一个根节点root
,然后用add_item
把单词列表中的每个单词添加到前缀树里。
3 个回答
0
虽然这不是直接的答案,但你可以考虑从一个更完善的Trie(字典树)实现开始,比如这个:
class Trie:
def __init__(self):
self.__final = False
self.__nodes = {}
def __repr__(self):
return 'Trie<len={}, final={}>'.format(len(self), self.__final)
def __getstate__(self):
return self.__final, self.__nodes
def __setstate__(self, state):
self.__final, self.__nodes = state
def __len__(self):
return len(self.__nodes)
def __bool__(self):
return self.__final
def __contains__(self, array):
try:
return self[array]
except KeyError:
return False
def __iter__(self):
yield self
for node in self.__nodes.values():
yield from node
def __getitem__(self, array):
return self.__get(array, False)
def create(self, array):
self.__get(array, True).__final = True
def read(self):
yield from self.__read([])
def update(self, array):
self[array].__final = True
def delete(self, array):
self[array].__final = False
def prune(self):
for key, value in tuple(self.__nodes.items()):
if not value.prune():
del self.__nodes[key]
if not len(self):
self.delete([])
return self
def __get(self, array, create):
if array:
head, *tail = array
if create and head not in self.__nodes:
self.__nodes[head] = Trie()
return self.__nodes[head].__get(tail, create)
return self
def __read(self, name):
if self.__final:
yield name
for key, value in self.__nodes.items():
yield from value.__read(name + [key])
2
这里有一个和你问题相关的回答:https://stackoverflow.com/a/11016430/793956。
另外,你也可以考虑使用一些库,比如 https://github.com/kmike/marisa-trie#readme 或者 https://github.com/kmike/datrie#readme。
4
如果你还没看过,可以去看看诺维格的《如何写一个拼写纠正器》。这个文章会教你怎么做拼写检查的工具。