从给定单词的字母中可以组成多少个4个字母或更多的常见英文单词（每个字母只能使用一次）

7 投票

6 回答

2546 浏览

提问于 2025-04-17 10:19

在一个块状日历的背面，我发现了以下谜语：

你能用“textbook”这个词的字母（每个字母只能用一次）组成多少个四个字母或更多的常见英语单词呢？

我想到的第一个解决方案是：

from itertools import permutations

with open('/usr/share/dict/words') as f:
    words = f.readlines()

words = map(lambda x: x.strip(), words)

given_word = 'textbook'

found_words = []

ps = (permutations(given_word, i) for i in range(4, len(given_word)+1))

for p in ps:
    for word in map(''.join, p):
        if word in words and word != given_word:
            found_words.append(word)
print set(found_words)

这个方法的结果是 set(['tote', 'oboe', 'text', 'boot', 'took', 'toot', 'book', 'toke', 'betook'])，但在我的电脑上花了超过7分钟。

我接下来的尝试是：

with open('/usr/share/dict/words') as f:
    words = f.readlines()

words = map(lambda x: x.strip(), words)

given_word = 'textbook'

print [word for word in words if len(word) >= 4 and sorted(filter(lambda letter: letter in word, given_word)) == sorted(word) and word != given_word]

这个方法几乎立刻就得到了答案，但结果是： ['book', 'oboe', 'text', 'toot']

这个问题最快、正确且最符合Python风格的解决方案是什么呢？

(编辑：我添加了我之前的排列组合解决方案及其不同的输出)。

排列组合解决方案计算复杂度语言处理字母组合英文单词常见词汇词汇生成

6 个回答

下面的代码会检查字典中的每个单词，看看它们的长度是否合适，然后再判断它们是否是“textbook”的排列组合。我借用了一个检查排列组合的方式，来自于在Python中检查两个字符串是否是排列组合，但我稍微改动了一下。

given_word = 'textbook'

with open('/usr/share/dict/words', 'r') as f:
    words = [s.strip() for s in f.readlines()]

matches = []
for word in words:
    if word != given_word and 4 <= len(word) <= len(given_word):
        if all(word.count(char) <= given_word.count(char) for char in word):
            matches.append(word)
print sorted(matches)

这个过程几乎是立刻完成的，并且能给出正确的结果。

回答于 2025-04-17 由 Python大师

分享举报

这样怎么样？

from itertools import permutations, chain

with open('/usr/share/dict/words') as fp:
    words = set(fp.read().split())

given_word = 'textbook'

perms = (permutations(given_word, i) for i in range(4, len(given_word)+1))
pwords = (''.join(p) for p in chain(*perms))
matches = words.intersection(pwords)

print matches

这样会得到

>>> print matches
set(['textbook', 'keto', 'obex', 'tote', 'oboe', 'text', 'boot', 'toto', 'took', 'koto', 'bott', 'tobe', 'boke', 'toot', 'book', 'bote', 'otto', 'toke', 'toko', 'oket'])

回答于 2025-04-17 由 Python大师

分享举报

我想分享一个稍微有趣的技巧，虽然这需要写更多的代码，而且不太符合“Python风格”。不过，如果看一下其他方法的运行时间，这个方法应该会比较快。

我们先做一些预处理，以加快计算速度。基本思路是这样的：给字母表中的每个字母分配一个质数。例如，A = 2，B = 3，依此类推。然后，我们为字母表中的每个单词计算一个哈希值，这个哈希值就是单词中每个字符对应的质数的乘积。接着，我们把每个单词存储在一个字典里，字典的索引就是这个哈希值。

现在，如果我们想找出哪些单词和textbook是等价的，我们只需要为这个单词计算相同的哈希值，然后在字典中查找即可。通常情况下（比如在C++中），我们需要担心溢出的问题，但在Python中就简单多了：在列表中，具有相同索引的单词会包含完全相同的字符。

下面的代码做了一点小优化：在我们的情况下，只需要关注出现在给定单词中的字符，这样我们可以使用一个更小的质数表（显而易见的优化是只给出现在单词中的字符分配值 - 反正速度已经足够快，所以我没有去做这个优化，这样我们就可以只预处理一次，适用于多个单词）。质数算法在很多情况下都很有用，所以你自己也应该有一个；）

from collections import defaultdict
from itertools import permutations

PRIMES = list(gen_primes(256)) # some arbitrary prime generator

def get_dict(path):
    res = defaultdict(list)
    with open(path, "r") as file:
        for line in file.readlines():
            word = line.strip().upper()
            hash = compute_hash(word)
            res[hash].append(word)
    return res

def compute_hash(word):
    hash = 1
    for char in word:
        try:
            hash *= PRIMES[ord(char) - ord(' ')]
        except IndexError:
            # contains some character out of range - always 0 for our purposes
            return 0
    return hash

def get_result(path, given_word):
    words = get_dict(path)
    given_word = given_word.upper()
    result = set()
    powerset = lambda x: powerset(x[1:]) + [x[:1] + y for y in powerset(x[1:])] if x else [x]
    for word in (word for word in powerset(given_word) if len(word) >= 4):
        hash = compute_hash(word)
        for equiv in words[hash]:
            result.add(equiv)
    return result

if __name__ == '__main__':
    path = "dict.txt"
    given_word = "textbook"
    result = get_result(path, given_word)
    print(result)

在我的Ubuntu单词列表（98k个单词）上运行得相当快，但我不认为这算是Python风格，因为它基本上是一个C++算法的移植。如果你想用这种方式比较多个单词，这个方法很有用。

回答于 2025-04-17 由 Python大师

分享举报

从给定单词的字母中可以组成多少个4个字母或更多的常见英文单词（每个字母只能使用一次）

6 个回答

撰写回答