在字母矩阵中查找国家名称

#!/usr/bin/env python import numpy as np import re import os B = [['k','l','m','a','l','t','a','l','b','s'], ['i','e','n','y','e','j','i','i','y','r'], ['o','r','o','h','w','d','r','z','u','i'], ['c','o','r','v','m','z','t','a','i','l'], ['i','p','w','b','j','q','s','r','d','a'], ['x','a','a','d','n','c','u','b','f','n'], ['e','g','y','e','h','i','a','h','w','k'], ['m','n','g','a','k','g','f','d','s','a'], ['g','i','d','n','a','l','g','n','e','y'], ['b','s','t','r','f','g','s','a','i','u']] Matrix=np.matrix(B) Shape=Matrix.shape row = Shape[0]-1 col = Shape[1]-1 wordlist = [] DIRECTIONS = [ (-1,-1), (0,-1), (1,-1),(-1,0),(1,0),(-1,1),(0,1),(1,1)] def expand(i,j,xd,yd): #every time expand to one direction based on passed in variable if ( i+xd >= 0 and i+xd <= row and j+yd >= 0 and j+yd <= col ): wordlist.append(wordlist[-1] + Matrix[i+xd,j+yd]) expand(i+xd,j+yd,xd,yd) print 'matrix is {} x{:2}'.format(Shape[0],Shape[1]) for i in range(Shape[0]): for j in range(Shape[1]): for xd,yd in DIRECTIONS: #before extend to each direction, should set last element as current position letter # country name is from http://www.countries-list.info/Download-List wordlist.append(Matrix[i,j]) expand(i,j,xd,yd) for word in wordlist: # tried to regex to search file, but it is slow comparing to system grep command if len(word) > 1 and ( not os.system("grep -iw " + word + " /home/cmaaek/Downloads/list.txt > /dev/null")): print(word)

2条回答

网友

1楼 · 编辑于 2024-04-25 20:17:49

这里是一个trie-python实现，它比在一个很小的网格上使用string-brute-force慢：

B = [['k','l','m','a','l','t','a','l','b','s'],
     ['i','e','n','y','e','j','i','i','y','r'],
     ['o','r','o','h','w','d','r','z','u','i'],
     ['c','o','r','v','m','z','t','a','i','l'],
     ['i','p','w','b','j','q','s','r','d','a'],
     ['x','a','a','d','n','c','u','b','f','n'],
     ['e','g','y','e','h','i','a','h','w','k'],
     ['m','n','g','a','k','g','f','d','s','a'],
     ['g','i','d','n','a','l','g','n','e','y'],
     ['b','s','t','r','f','g','s','a','i','u']]

方法：

way={}
A=way[1,0]=sum(B,[])
way[0,1]=sum((A[i::10] for i in range(10)),[])
way[1,1]=(A*11)[::11]
way[-1,1]=(A*9)[::9]
ways='|'.join([''.join(l) for  l in way.values()])    
ways+= '|'+ ways[::-1]
#countries=['afghanistan',....,'england;), ...

蛮力：

# In [26]: [w for w in countries if w in ways]
# Out[26]: ['austria', 'brazil', 'chad', 'england', 'malta',\
# 'mexico', 'norway', 'singapore']
# 
# In [27]: %timeit [w for w in countries if w in ways]
# 238 µs ± 7.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
#

trie公司：

from collections import defaultdict
trie = lambda : defaultdict(trie) 

def add(country,root):
    t=root
    for l in country: t=t[l]
    t[''] = country

countrie = trie()
for c in countries : add(c,countrie)  

def find(trie,way):
    for i in range(len(way)):
        t=trie
        for l in way[i:]:
            if l in t:
                t=t[l]
                if '' in t : yield t['']
            else: break


# In [28]: list(find(countrie,ways))
# Out[28]: 
# ['malta', 'norway', 'chad', 'brazil', 'austria',\
#  'singapor, 'mexico', 'england']
# 
# In [29]: %timeit list(find(countrie,ways))
# 457 µs ± 9.22 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

编辑

使用here的370k英语单词词典，暴力破解需要412毫秒找到496个单词。Trie技术的速度快了500倍，仅需900µs，即使Trie的创建成本是600毫秒；但您只需构建一次。你知道吗

网友
2楼 · 编辑于 2024-04-25 20:17:49

trie（发音为“try”）是如下所示的数据结构：
trie中的每条路径对应于一个不同的单词。在上面的trie中，您可以沿着任何路径跟踪以形成有效的单词。这是由两个节点共享一个父节点构成的，如果它们在特定位置共享一个字母。因此，对于“NEWS”和“NOT”，每个单词第一个位置的字符是相同的-这就是为什么它们在图中共享节点“N”。但是，每个单词的第二个字符分别是“E”和“O”，这两个字符不相同，因此它们会分支成不同的路径。每个节点还有一个指示器，告诉您它是否对应于一个单词的结尾（图中没有显示）。你知道吗
这种方法之所以有效，是因为它是一种非常紧凑/高效的表示字典的方法。它还允许我们快速确定一个查询词是否在我们的字典中，因为我们每次只需要沿着一条路径确定一个词是否在我们的字典中。你知道吗
您可以从国家名称列表构建一个trie，然后，您可以使用trie在board中查询国家名称。例如，如果在我上面给出的trie图中查找“Video”，则返回False，因为没有任何起始节点（A、D、N、Z）与第一个字符V匹配。如果第一个字符匹配，则检查是否有任何子节点与下一个字符匹配，依此类推。当你从纵横字谜板的某个起始字符开始搜索时，这允许你快速消除选项。你知道吗
作为另一个优化，您可能希望记录导致死角的纵横字谜板位置（即无法从该位置形成有效的国家名称）。这将允许您避免查看无法帮助您找到解决方案的董事会位置，这也会加快代码的速度。希望这有帮助！你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章