如何优化此Python代码以生成worddistance为1的所有单词？

2024-05-14 11:24:51 发布

男 | 程序猿一只，喜欢编程写python代码。

分析显示，这是我编写的一个小型文字游戏代码中速度最慢的部分：

def distance(word1, word2):
    difference = 0
    for i in range(len(word1)):
        if word1[i] != word2[i]:
            difference += 1
    return difference

def getchildren(word, wordlist):
    return [ w for w in wordlist if distance(word, w) == 1 ]

注：

distance()的调用次数超过了500万次，其中大部分来自getchildren，它应该得到单词列表中与word相差1个字母的所有单词。
wordlist被预筛选为只有与word包含相同数量字母的单词，因此可以保证word1和word2具有相同数量的字符。
我对Python还不太熟悉（3天前就开始学习了），所以对命名约定或其他风格的评论也很受欢迎。
对于wordlist，使用“2+2lemma.txt”文件获取12dict word list

结果：

谢谢大家，结合不同的建议，我现在使程序运行速度提高了两倍（除了我在询问之前自己做的优化之外，速度比最初的实现大约提高了4倍）

我用两组输入进行了测试，我称之为A和B

优化1：遍历word1,2的索引。。。从

for i in range(len(word1)):
        if word1[i] != word2[i]:
            difference += 1
    return difference

使用zip(word1, word2)迭代字母对

for x,y in zip (word1, word2):
        if x != y:
            difference += 1
    return difference

输入A的执行时间从11.92到9.18，输入B的执行时间从79.30到74.59

优化2：除了distance方法之外，还为differs添加了一个单独的方法（在a*试探法中，我仍然需要这个方法）

def is_neighbors(word1,word2):
    different = False
    for c1,c2 in zip(word1,word2):
        if c1 != c2:
            if different:
                return False
            different = True
    return different

输入A的执行时间从9.18到8.83，输入B的执行时间从74.59到70.14

优化3：这里的大赢家是使用izip，而不是zip

输入A的执行时间从8.83到5.02，输入B的执行时间从70.14到41.69

我也许可以用低级语言写得更好，但我现在对此很满意。谢谢大家！

再次编辑：更多结果使用Mark的方法检查第一个字母不匹配的情况，将其从5.02->；3.59和41.69->；29.82降下来

在此基础上，我合并了izip，而不是range，最终得到了这样的结果：

def is_neighbors(word1,word2):
    if word1[0] != word2[0]:
        return word1[1:] == word2[1:]
    different = False
    for x,y in izip(word1[1:],word2[1:]):
        if x != y:
            if different:
                return False
            different = True
    return different

这使得纽约时报从3.59->；3.38和29.82->；27.88下跌了一点

更多的结果！

尝试Sumudu的建议，即我从“word”中生成一个包含1个字母的所有字符串的列表，然后检查哪些字符串在word list中，而不是使用I s_neighbor函数，我得到的结果是：

def one_letter_off_strings(word):
    import string
    dif_list = []
    for i in xrange(len(word)):
        dif_list.extend((word[:i] + l + word[i+1:] for l in string.ascii_lowercase if l != word[i]))
    return dif_list

def getchildren(word, wordlist):
    oneoff = one_letter_off_strings(word)
    return ( w for w in oneoff if w in wordlist )

结果是速度变慢了（3.38->；3.74和27.88->；34.40），但似乎很有希望。一开始我以为我需要优化的部分是“一个字母串”，但是分析显示不是这样的，而缓慢的部分实际上是

( w for w in oneoff if w in wordlist )

我想如果我关掉“一个”和“单词列表”并用另一种方式进行比较，当我想到我在寻找两个列表的交集时，会不会有什么不同。我将其替换为在字母上设置交集：

return set(oneoff) & set(wordlist)

砰！3.74->；0.23和34.40->；2.25

这真是惊人，与我最初天真的实现速度完全不同： 23.79->；0.23和180.07->；2.25，因此大约比原始实现快80到100倍。

如果有人感兴趣，我写了一篇博客文章describing the program和describing the optimizations包括一篇这里没有提到的（因为它在另一段代码中）。

大辩论：

好吧，我和Unknown正在进行一场大辩论，你可以在his answer的评论中看到。他声称，如果将其移植到C中，则使用原始方法（使用isúneighbor而不是使用集合）会更快。我尝试了2个小时来获取一个我编写的C模块，并在尝试遵循this和this示例后成功地实现了链接，看起来Windows中的过程有点不同？我不知道，但我放弃了那。无论如何，这里是程序的完整代码，文本文件来自使用“2+2lemma.txt”文件的12dict word list。抱歉，如果代码有点混乱，这只是我一起黑的东西。另外，我忘了从单词列表中去掉逗号，所以这实际上是一个错误，为了相同的比较，您可以留下来，或者通过在cleanentries中的字符列表中添加逗号来修复它。

from itertools import izip
def unique(seq):  
    seen = {} 
    result = [] 
    for item in seq: 
        if item in seen:
            continue 
        seen[item] = 1 
        result.append(item) 
    return result
def cleanentries(li):
    pass
    return unique( [w.strip('[]') for w in li if w != "->"] )
def distance(word1, word2):
    difference = 0
    for x,y in izip (word1, word2):
        if x != y:
            difference += 1
    return difference
def is_neighbors(word1,word2):
    if word1[0] != word2[0]:
        return word1[1:] == word2[1:]
    different = False
    for x,y in izip(word1[1:],word2[1:]):
        if x != y:
            if different:
                return False
            different = True
    return different
def one_letter_off_strings(word):
    import string
    dif_list = []
    for i in xrange(len(word)):
        dif_list.extend((word[:i] + l + word[i+1:] for l in string.ascii_lowercase if l != word[i]))
    return dif_list

def getchildren(word, wordlist):
    oneoff = one_letter_off_strings(word)
    return set(oneoff) & set(wordlist)
def AStar(start, goal, wordlist):
    import Queue
    closedset = []
    openset = [start]
    pqueue = Queue.PriorityQueue(0)
    g_score = {start:0}         #Distance from start along optimal path.
    h_score = {start:distance(start, goal)}
    f_score = {start:h_score[start]}
    pqueue.put((f_score[start], start))
    parent_dict = {}
    while len(openset) > 0:
        x = pqueue.get(False)[1]
        if x == goal:
            return reconstruct_path(parent_dict,goal)
        openset.remove(x)
        closedset.append(x)
        sortedOpen = [(f_score[w], w, g_score[w], h_score[w]) for w in openset]
        sortedOpen.sort()
        for y in getchildren(x, wordlist):
            if y in closedset:
                continue
            temp_g_score = g_score[x] + 1
            temp_is_better = False
            appended = False
            if (not y in openset): 
                openset.append(y)
                appended = True
                h_score[y] = distance(y, goal)
                temp_is_better = True
            elif temp_g_score < g_score[y] :
                temp_is_better = True
            else :
                pass
            if temp_is_better:
                parent_dict[y] = x
                g_score[y] = temp_g_score
                f_score[y] = g_score[y] + h_score[y]
                if appended :
                    pqueue.put((f_score[y], y))
    return None


def reconstruct_path(parent_dict,node):
     if node in parent_dict.keys():
         p = reconstruct_path(parent_dict,parent_dict[node])
         p.append(node)
         return p
     else:
         return []        

wordfile = open("2+2lemma.txt")
wordlist = cleanentries(wordfile.read().split())
wordfile.close()
words = []
while True:
    userentry = raw_input("Hello, enter the 2 words to play with separated by a space:\n ")
    words = [w.lower() for w in userentry.split()]
    if(len(words) == 2 and len(words[0]) == len(words[1])):
        break
print "You selected %s and %s as your words" % (words[0], words[1])
wordlist = [ w for w in wordlist if len(words[0]) == len(w)]
answer = AStar(words[0], words[1], wordlist)
if answer != None:
    print "Minimum number of steps is %s" % (len(answer))
    reply = raw_input("Would you like the answer(y/n)? ")
    if(reply.lower() == "y"):
        answer.insert(0, words[0])
        print "\n".join(answer)
    else:
        print "Good luck!"
else:
    print "Sorry, there's no answer to yours"
reply = raw_input("Press enter to exit")

我把isu neighbors方法留在了中，尽管它没有使用。这是建议移植到C的方法。要使用它，只需将getchildren替换为：

def getchildren(word, wordlist):
    return ( w for w in wordlist if is_neighbors(word, w))

至于让它作为一个C模块工作，我没有走那么远，但这是我想到的：

#include "Python.h"

static PyObject *
py_is_neighbor(PyObject *self, Pyobject *args)
{
    int length;
    const char *word1, *word2;
    if (!PyArg_ParseTuple(args, "ss", &word1, &word2, &length))
        return NULL;

    int i;
    int different = 0;
    for (i =0; i < length; i++)
    {
        if (*(word1 + i) != *(word2 + i))
        {
            if (different)
            {
                return Py_BuildValue("i", different);
            }
            different = 1;
        }
    }
    return Py_BuildValue("i", different);
}

PyMethodDef methods[] = {
    {"isneighbor", py_is_neighbor, METH_VARARGS, "Returns whether words are neighbors"},
    {NULL, NULL, 0, NULL}
};

PyMODINIT_FUNC
initIsNeighbor(void)
{
    Py_InitModule("isneighbor", methods);
}

我用以下方法分析了这个问题：

python -m cProfile "Wordgame.py"

记录的时间是AStar方法调用的总时间。快速输入集是“诗诗人”，而长输入集是“诗诗人”。不同机器之间的计时显然会有所不同，因此如果有人最终尝试这样做，请按原样比较程序和C模块的结果。

Tags： in for len return if is def word

0条回答

目前没有回答

如何优化此Python代码以生成worddistance为1的所有单词？

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何优化此Python代码以生成worddistance为1的所有单词？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >