在Python中找到字符串拆分的所有列表排列
我有一串字母,想把它们分成所有可能的组合(字母的顺序必须保持不变),这样:
s = 'monkey'
就变成:
combinations = [['m', 'onkey'], ['mo', 'nkey'], ['m', 'o', 'nkey'] ... etc]
有什么好主意吗?
6 个回答
5
这个想法是要明白,一个字符串 s
的 排列 是一个集合,这个集合包含了 s
本身,以及 s
的每个子字符串 X
和 s\X
的排列的集合。举个例子,假设我们要排列字符串 key
:
{'key'} # 这就是 'key' 本身
{'k', 'ey'} # 子字符串 'k' 和 'ey' 的第一个排列结合 = {'e', 'y'}
{'k', 'e', 'y'} # 子字符串 'k' 和 'ey' 的第二个排列结合 = {'ey'}
{'ke', 'y'} # 子字符串 'ke' 和 'y' 的唯一排列结合 = {'y'}
- 把第1、2、3、4步的结果合在一起,就得到了字符串
key
的所有排列。
有了这个思路,我们可以实现一个简单的算法:
>>> def permute(s):
result = [[s]]
for i in range(1, len(s)):
first = [s[:i]]
rest = s[i:]
for p in permute(rest):
result.append(first + p)
return result
>>> for p in permute('monkey'):
print(p)
['monkey']
['m', 'onkey']
['m', 'o', 'nkey']
['m', 'o', 'n', 'key']
['m', 'o', 'n', 'k', 'ey']
['m', 'o', 'n', 'k', 'e', 'y']
['m', 'o', 'n', 'ke', 'y']
['m', 'o', 'nk', 'ey']
['m', 'o', 'nk', 'e', 'y']
['m', 'o', 'nke', 'y']
['m', 'on', 'key']
['m', 'on', 'k', 'ey']
['m', 'on', 'k', 'e', 'y']
['m', 'on', 'ke', 'y']
['m', 'onk', 'ey']
['m', 'onk', 'e', 'y']
['m', 'onke', 'y']
['mo', 'nkey']
['mo', 'n', 'key']
['mo', 'n', 'k', 'ey']
['mo', 'n', 'k', 'e', 'y']
['mo', 'n', 'ke', 'y']
['mo', 'nk', 'ey']
['mo', 'nk', 'e', 'y']
['mo', 'nke', 'y']
['mon', 'key']
['mon', 'k', 'ey']
['mon', 'k', 'e', 'y']
['mon', 'ke', 'y']
['monk', 'ey']
['monk', 'e', 'y']
['monke', 'y']
9
http://wordaligned.org/articles/partitioning-with-python 里有一篇关于序列分割的有趣文章,下面是他们使用的实现代码:
#!/usr/bin/env python
# From http://wordaligned.org/articles/partitioning-with-python
from itertools import chain, combinations
def sliceable(xs):
'''Return a sliceable version of the iterable xs.'''
try:
xs[:0]
return xs
except TypeError:
return tuple(xs)
def partition(iterable):
s = sliceable(iterable)
n = len(s)
b, mid, e = [0], list(range(1, n)), [n]
getslice = s.__getitem__
splits = (d for i in range(n) for d in combinations(mid, i))
return [[s[sl] for sl in map(slice, chain(b, d), chain(d, e))]
for d in splits]
if __name__ == '__main__':
s = "monkey"
for i in partition(s):
print i
这段代码会输出:
['monkey']
['m', 'onkey']
['mo', 'nkey']
['mon', 'key']
['monk', 'ey']
['monke', 'y']
['m', 'o', 'nkey']
['m', 'on', 'key']
['m', 'onk', 'ey']
['m', 'onke', 'y']
['mo', 'n', 'key']
['mo', 'nk', 'ey']
['mo', 'nke', 'y']
['mon', 'k', 'ey']
['mon', 'ke', 'y']
['monk', 'e', 'y']
...
['mo', 'n', 'k', 'e', 'y']
['m', 'o', 'n', 'k', 'e', 'y']
14
def splitter(str):
for i in range(1, len(str)):
start = str[0:i]
end = str[i:]
yield (start, end)
for split in splitter(end):
result = [start]
result.extend(split)
yield result
combinations = list(splitter(str))
请注意,我默认使用了生成器,这样可以避免在处理很长的字符串时耗尽内存。