怎么能让这个Python拼字游戏找词器更快呢？

3条回答

网友

1楼 · 编辑于 2024-05-19 21:14:21

import trie


def walk_trie(trie_node, rack, path=""):
    if trie_node.value is None:
        yield path
    for i in xrange(len(rack)):
        sub_rack = rack[:i] + rack[i+1:]
        if trie_node.nodes.has_key(rack[i]):
            for word in walk_trie(trie_node.nodes[rack[i]], sub_rack, path+rack[i]):
                yield word


if __name__ == "__main__":
    print "Generating trie... "

    # You might choose to skip words starting with a capital
    # rather than lower-casing and searching everything. Capitalised
    # words are probably pronouns which aren't allowed in Scrabble

    # I've skipped words shorter than 3 characters.
    all_words = ((line.strip().lower(), None) for line in open("/usr/share/dict/words") if len(line.strip()) >= 3)
    word_trie = trie.Trie(mapping=all_words)
    print "Walking Trie... "
    print list(walk_trie(word_trie.root, "abcdefg"))

生成trie需要一段时间，但是一旦生成，获取单词列表应该比循环遍历列表快得多。

如果有人知道序列化trie的方法，那将是一个很好的补充。

只是为了证明生成trie需要花费时间。。。

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    98333    5.344    0.000    8.694    0.000 trie.py:87(__setitem__)
   832722    1.849    0.000    1.849    0.000 trie.py:10(__init__)
   832721    1.501    0.000    1.501    0.000 {method 'setdefault' of 'dict' objects}
    98334    1.005    0.000    1.730    0.000 scrabble.py:16(<genexpr>)
        1    0.491    0.491   10.915   10.915 trie.py:82(extend)
   196902    0.366    0.000    0.366    0.000 {method 'strip' of 'str' objects}
    98333    0.183    0.000    0.183    0.000 {method 'lower' of 'str' objects}
    98707    0.177    0.000    0.177    0.000 {len}
   285/33    0.003    0.000    0.004    0.000 scrabble.py:4(walk_trie)
      545    0.001    0.000    0.001    0.000 {method 'has_key' of 'dict' objects}
        1    0.001    0.001   10.921   10.921 {execfile}
        1    0.001    0.001   10.920   10.920 scrabble.py:1(<module>)
        1    0.000    0.000    0.000    0.000 trie.py:1(<module>)
        1    0.000    0.000    0.000    0.000 {open}
        1    0.000    0.000    0.000    0.000 trie.py:5(Node)
        1    0.000    0.000   10.915   10.915 trie.py:72(__init__)
        1    0.000    0.000    0.000    0.000 trie.py:33(Trie)
        1    0.000    0.000   10.921   10.921 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
        1    0.000    0.000    0.000    0.000 trie.py:1(NeedMore)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

网友

2楼 · 编辑于 2024-05-19 21:14:21

您可以使用/usr/dict/share/words字典进行排序，这样就可以跳过字典中的许多单词，而不必考虑它们。

例如，假设一个字典单词以“a”开头，而机架中没有“a”。您可以在单词列表中对以“B”开头的第一个单词进行二进制搜索，并跳过中间的所有单词。这在大多数情况下都会有很大的不同-你可能会跳过一半的单词。

网友

3楼 · 编辑于 2024-05-19 21:14:21

在不过分偏离基本代码的情况下，下面是一些相当简单的优化：

首先，将你的单词阅读器改为：

def word_reader(filename, L):
  L2 = L+2
  # returns an iterator
  return (word.strip() for word in open(filename) \
          if len(word) < L2 and len(word) > 2)

并称之为

words = word_reader('/usr/share/dict/words', len(rack))

这是我所建议的所有更改中最大的改进。它消除了在我们在这个过程中走得太远之前太长或太短的单词。请记住，在我的比较中，word是未压缩的新行字符。我假设是“\n”行分隔符。另外，列表中的最后一个单词可能有问题，因为它的结尾可能没有新行，但在我的计算机上，最后一个单词是词组，无论如何我们的方法都找不到。当然，你可以事先创建自己的字典，从原始字典中删除那些无效的：那些长度不对或字母大小超过a-z的字典

接下来，费兰提出了一个机架设置变量，这是一个好主意。然而，你也会在每一个单词的编排上慢下来。使用这些镜头的目的是为了剔除很多根本没有镜头的镜头，从而加快拍摄速度。不过，我发现，在调用可拼写之前，检查单词的第一个字母是否在机架中更快：

rackset = frozenset(rack)
scored =  [(score_word(word), word) for word in words if word[0] in rackset \
           and spellable(word, rack)]

然而，这必须伴随着拼写的改变。我改为：

def spellable(word, rack):
    return all( [rack.count(letter) >= word.count(letter) \
                 for letter in set(word)] )

即使没有上一步的改变，也比你现在拥有的要快。

通过上面的三个修改，代码比我的简单测试快了大约3倍。

使用更好的算法

因为你真正要做的是寻找字谜，所以使用字谜字典是有意义的。一本字谜字典把字典里的每个字都取出来，如果它们是字谜，就把它们分组。例如，“takes”和“skate”是彼此的anagrams，因为它们在排序时都等于“aekst”。我创建了一个anagram字典作为一个文本文件，其格式是每行组成一个条目。每个条目都有排序版本的anagrams，然后是anagrams本身。例如我使用的条目

aekst skate takes

然后我就可以把这些字母组合起来，在anagram字典中对每一个字母进行二进制搜索，看看是否匹配。对于7个字母的机架，最多有120个字母的唯一拼字有效组合。执行二进制搜索是O（log（N））所以速度非常快。

我分两部分实现了算法。第一个是拼字词典，第二个是真正的拼字作弊程序。

Anagram字典创建者代码

f = open('/usr/share/dict/words')
d = {}
lets = set('abcdefghijklmnopqrstuvwxyz\n')
for word in f:
  if len(set(word) - lets) == 0 and len(word) > 2 and len(word) < 9:
    word = word.strip()
    key = ''.join(sorted(word))
    if key in d:
      d[key].append(word)
    else:
      d[key] = [word]
f.close()
anadict = [' '.join([key]+value) for key, value in d.iteritems()]
anadict.sort()
f = open('anadict.txt','w')
f.write('\n'.join(anadict))
f.close()

拼字作弊代码

from bisect import bisect_left
from itertools import combinations
from time import time

def loadvars():
  f = open('anadict.txt','r')
  anadict = f.read().split('\n')
  f.close()
  return anadict

scores = {"a": 1, "c": 3, "b": 3, "e": 1, "d": 2, "g": 2, 
         "f": 4, "i": 1, "h": 4, "k": 5, "j": 8, "m": 3, 
         "l": 1, "o": 1, "n": 1, "q": 10, "p": 3, "s": 1, 
         "r": 1, "u": 1, "t": 1, "w": 4, "v": 4, "y": 4, 
         "x": 8, "z": 10}

def score_word(word):
  return sum([scores[c] for c in word])

def findwords(rack, anadict):
  rack = ''.join(sorted(rack))
  foundwords = []
  for i in xrange(2,len(rack)+1):
    for comb in combinations(rack,i):
      ana = ''.join(comb)
      j = bisect_left(anadict, ana)
      if j == len(anadict):
        continue
      words = anadict[j].split()
      if words[0] == ana:
        foundwords.extend(words[1:])
  return foundwords

if __name__ == "__main__":
  import sys
  if len(sys.argv) == 2:
    rack = sys.argv[1].strip()
  else:
    print """Usage: python cheat_at_scrabble.py <yourrack>"""
    exit()
  t = time()
  anadict = loadvars()
  print "Dictionary loading time:",(time()-t)
  t = time()
  foundwords = set(findwords(rack, anadict))
  scored = [(score_word(word), word) for word in foundwords]
  scored.sort()
  for score, word in scored:
    print "%d\t%s" % (score,word)
  print "Time elapsed:", (time()-t)

在我的机器上，造字法词典的创作者花了大约半秒钟的时间。当字典已经创建好时，运行scrabble作弊程序比OP的代码快15倍，比OP的代码快5倍。此外，加载字典的启动时间比从机架中实际搜索单词的时间要长得多，因此这是同时执行多个搜索的更好方法。

相关问题更多 >

编程相关推荐

热门问题

热门文章