<p>根据John的输入,我创建了以下例程。在</p>
<p>除了前面的计算,我还做了一个单独的单词匹配,并计算出所有单词的平均分数由Alexa提供。在</p>
<p>总分是两个分数的乘积。在</p>
<p>我还试图忽略任何基于字长的假设填充词。基于一个非常基本的统计摘要(字数和中间字长),我将忽略所有字长小于5、4或2个字元的字词。使用字典可能是一个更好的解决方案,但由于多语言环境,我想避免这种情况。在</p>
<pre><code>from difflib import SequenceMatcher
from statistics import median, mean
def getWords(input):
words = input.split()
lengths = [ len(x) for x in words if len(x) > 1 ]
# set the minimum word length based on word count
# and median of word length to remove presumed fillers
minLength = 2
if len(words) >= 3 and median(lengths) > 4:
minLength = 5
elif len(words) >= 2 and median(lengths) > 3:
minLength = 4
# keep words of minimum length
answer = list()
for item in words:
if len(item) >= minLength:
answer.append(item)
return answer
matchList = ["Die Verurteilten", "Der Pate", "Der Pate 2", "The Dark Knight", "Die zwölf Geschworenen", "Schindlers Liste", "Pulp Fiction", "Der Herr der Ringe - Die Rückkehr des Königs", "Zwei glorreiche Halunken", "Fight Club", "Der Herr der Ringe - Die Gefährten", "Forrest Gump", "Das Imperium schlägt zurück", "Inception", "Der Herr der Ringe - Die zwei Türme", "Einer flog über das Kuckucksnest", "GoodFellas - Drei Jahrzehnte in der Mafia", "Matrix", "Die sieben Samurai", "Krieg der Sterne", "City of God", "Sieben", "Das Schweigen der Lämmer", "Ist das Leben nicht schön?", "Das Leben ist schön"]
userInput = "Die Gefährten"
# find the best match between the user input and the link list
maxi = 0
for matchItem in matchList:
# ratio of the original item comparison
fullRatio = SequenceMatcher(None, userInput, matchItem).ratio()
# every word of the user input will be compared
# to each word of the list item, the maximum score
# for each user word will be kept
wordResults = list()
for userWord in getWords(userInput):
maxWordRatio = 0
for matchWord in getWords(matchItem):
wordRatio = SequenceMatcher(None, userWord, matchWord).ratio()
if wordRatio > maxWordRatio:
maxWordRatio = wordRatio
wordResults.append(maxWordRatio)
# the total score for each list item is the full ratio
# multiplied by the mean of all single word scores
itemScore = fullRatio * mean(wordResults)
# print item result
print('%.5f' % itemScore, matchItem)
# keep track of maximum score
if itemScore > maxi:
maxi = itemScore
result = matchItem
# award ceremony
print(result)
</code></pre>
<p>此例程的排名输出(更好):</p>
^{pr2}$
<p>广泛的测试将告诉我们这个解决方案到底有多有效。在</p>