Python 3：基础拼写检查

0 投票

1 回答

1556 浏览

提问于 2025-05-01 07:27

#SPELL CHECK PROGRAM
def SpellCheck():

    import difflib
    import urllib.request

    #Downloads 10000 most used english words as txt file
    url = 'https://raw.githubusercontent.com/first20hours/google-10000-english/master/google-10000-english.txt'
    response = urllib.request.urlopen(url)
    data = response.read()      #
    text = data.decode('utf-8') #

    x = True
    while x == True:
        print("Enter your word: ")
        word = input(">>  ")

        clr(2)
        s(3)

        if (word.lower()) in text:
            print(word, " is spelt correctly")

        else:
            print("It doesn't seem like i know that word")

            #closewords = difflib.get_close_matches(word, text)
            wrongcount = 0
            correctcount = 0
            closewords = []

            ############

            for x in text:
                wrongcount = 0
                correctcount = 0

                for let in x:
                    for letter in word:
                        if let == letter:
                            correctcount += 1
                        else:
                            wrongcount +=1

                    if correctcount > len(word) - 4:
                        closewords.append(x)

                    else:
                        x = 0

            #############

            print("Perhaps you meant one of these: ")
            print( closewords )

        print("Would you like to try again?")
        again = input(">>  ")

        if again == 'Y' or again ==  'y' or again ==  'yes' or again ==  'Yes':
            x = True
        else:
            print("Okay")
            x = False

这个程序应该先准备一个包含10,000个最常用英语单词的列表，然后把这些单词整理成一个清单。如果用户输入的单词不在这个列表里，程序会对每个字母进行检查，看看这些字母是否和用户输入的单词中的字母有相同的地方。这样可以判断两个单词之间是否有相同的字母。如果有相同的字母，程序就会把这些相似的单词列出来，作为建议。

如果用户输入的单词拼写正确，程序会显示“拼写正确”。

如果用户输入的单词拼写错误，程序会显示“拼写错误”，但不会给出任何建议。比如说，如果我输入“pooison”，程序不会给出任何提示，尽管“poison”这个单词在字典里是存在的。

在这种情况下，程序应该输出“poison”。

暂无标签

1 个回答

关于建议的部分，别用你现在的方法。一个可能更严谨的建议相似单词的方式是使用一种叫做史密斯-沃特曼算法的对齐算法。

这个算法的基本思路是，你为字母不匹配和在一个单词中插入字母设定惩罚分数。然后，你可以简单地选择字典中那些匹配得分超过某个阈值的单词。对于长度为n和m的单词，这个算法的复杂度是O(mn)，所以对于一万个短单词来说，这个效率应该是相当不错的。

在实现方面，这个算法主要用于基因序列的近似匹配，所以大多数Python的实现都是针对这个用途的。如果你找不到通用的实现，自己写一个作为学习练习也是很有价值的。如果你需要，我可以给你一些建议吗？

示例

import numpy as np
from collections import defaultdict

def sub_function(letter1, letter2):
    
    if letter1 == letter2:
        return 1 #increase score if letter match
    else:
        return -1 # decrease score if letter mismatch

        
#######################################################        


def needleman_wunsch(si,sj,d=-2):
    #dimensions
    I =len(si)+1 ; J = len(sj)+1
        
    #define a dynamic programming matrix+backpointer matrix as a numpy array
    DP=np.zeros([len(si)+1,len(sj)+1]); PTR = DP.copy()

    #initialise top and left edges with the ga[ penalties
    for i in range(0,DP.shape[0]):
        DP[i,0]=d*(i)
    for i in range(0,DP.shape[1]):
        DP[0,i]=d*(i)
    
    #iterations over DP matrix, generate PTR matrix to all reconstruction 
    for i in range(1,I):
        for j in range(1,J):
            F_ij =[DP[i,j-1]+d,DP[i-1,j]+d,DP[i-1,j-1]+sub_function(si[i-1],sj[j-1])]
            DP[i,j]=max(F_ij)
            PTR[i,j]=F_ij.index(DP[i,j])
    
    #reconstructed strings 
    sI='';sJ=''
    l_c = [I-1,J-1]; p_c=PTR[l_c[0],l_c[1]]
    
    #main loop
    while l_c[0] >0 and l_c[1] >0:
        i=l_c[0]-1; j=l_c[1]-1 # character indices
        if PTR[l_c[0],l_c[1]] ==2:
            sI+=si[i]; sJ+=sj[j];
            l_c=[l_c[0]-1,l_c[1]-1] 
        elif PTR[l_c[0],l_c[1]] ==1:
            l_c=[l_c[0]-1,l_c[1]]
            sI+=si[i]; sJ+='-';
        elif PTR[l_c[0],l_c[1]] ==0:
            l_c=[l_c[0],l_c[1]-1]
            sI+='-'; sJ+=sj[j];
        
    #reversing strings as loop builds them backwards
    sI=sI[::-1]; sJ=sJ[::-1]
   
    return (sI,sJ,DP[-1,-1]) # a tuple of the two string+ score
    
def display(nw_tuple):
    print nw_tuple[0]
    print nw_tuple[1]
    print 'score: '+str(nw_tuple[2])
    

match= needleman_wunsch('acknowledgment','acknowlefdgment')
display(match)

回答于 2025-05-01 由 Python大师

分享举报

Python 3：基础拼写检查

1 个回答

撰写回答