Itertools合并两个列表以获得所有可能的组合

2024-03-29 13:33:45 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个列表:ab。你知道吗

a是包含三个或更多字符串的列表,而b是分隔符列表。你知道吗

我需要生成a的所有可能组合,然后将结果与b的所有可能组合“合并”(请参见示例以获得更好的理解)。你知道吗

我最终使用了以下代码:

from itertools import permutations, combinations, product

a = ["filename", "timestamp", "custom"]
b = ["_", "-", ".", ""]

output = []

for com in combinations(b, len(a) - 1):
    for per in product(com, repeat=len(a) - 1):
        for ear_per in permutations(a):
            out = ''.join(map(''.join, zip(list(ear_per[:-1]), per))) + list(ear_per)[-1]
            output.append(out)

# For some reason the algorithm is generating duplicates
output = list(dict.fromkeys(output))

for o in output:
    print o

这是一个输出示例(这是正确的,这是我在本例中需要的):

timestamp.customfilename
filenamecustom.timestamp
custom_filenametimestamp
timestamp_custom_filename
timestamp-filename.custom
custom_filename-timestamp
filename.timestamp-custom
. . .
filename.custom.timestamp
filename-customtimestamp
custom-timestamp_filename
filename_custom-timestamp
filename.timestampcustom
timestampcustom-filename
custom-timestamp.filename
filenamecustom_timestamp
timestamp.custom_filename
custom.timestampfilename
timestampfilename.custom
customfilename_timestamp
filenametimestamp-custom
custom-filenametimestamp
timestampfilename-custom
timestamp-custom-filename
custom.filenametimestamp
customfilenametimestamp
timestampfilename_custom
custom_filename.timestamp
custom-timestamp-filename
custom-timestampfilename
filename_timestamp.custom
. . .
filename.custom-timestamp
timestamp_filenamecustom
custom_timestampfilename
timestamp.custom.filename
timestamp.filename-custom
filename-custom-timestamp
customfilename.timestamp
filename_timestamp_custom
timestamp_filename.custom
customtimestampfilename
filenamecustomtimestamp
custom.timestamp_filename
filename_customtimestamp
. . .
timestamp-customfilename
filename_custom.timestamp

此算法有两个主要问题:

  1. 它会生成一些重复的行,所以我总是需要删除它们(在较大的数据集上速度较慢)

  2. if len(a) > len(b) + 2脚本无法启动。在这种情况下,我需要重复分隔符以覆盖a中包含的单词之间的len(a) - 1可用空格。


Tags: in列表foroutputlencustomfilenametimestamp
2条回答

这可能是一个可行的解决办法。它将a的(3*2 = 6)的排列与b的(2 at a time here, 4*4 == 16)product交错,得到总共6 * 16 == 96个结果。你知道吗

from itertools import permutations, chain, zip_longest, product

a = ["filename", "timestamp", "custom"]
b = ["_", "-", ".", ""]

i=0
for perm in permutations(a):
    for prod in product(b,repeat=len(a)-1):
        tpls = list(chain.from_iterable(zip_longest(perm, prod, fillvalue='')))
        print(''.join(tpls))
        i += 1
print(i)

您可能正在寻找:

a = ["filename", "timestamp", "custom"]
b = ["_", "-", ".", ""]
count = 0

def print_sequence(sol_words, sol_seps):
  global count 
  print("".join([sol_words[i] + sep for (i, sep) in enumerate(sol_seps)] + [sol_words[-1]]))
  count += 1

def backtrack_seps(sol_words, seps, sol_seps, i):
  for (si, sep) in enumerate(seps):
    sol_seps[i] = sep

    if i == len(sol_words) - 2:
      print_sequence(sol_words, sol_seps)
    else:
      backtrack_seps(sol_words, seps, sol_seps, i + 1)

def bt_for_sep(sol_words, seps):
  backtrack_seps(sol_words, seps, [''] * (len(sol_words) - 1), 0)

def backtrack_words(active, words, seps, sol_words, i):
  for (wi, word) in enumerate(words):
    if active[wi]:
      sol_words[i] = word
      active[wi] = False

      if i == len(words) - 1:
        bt_for_sep(sol_words, seps)
      else:
        backtrack_words(active, words, seps, sol_words, i + 1)

      active[wi] = True

backtrack_words([True] * len(a), set(a), set(b), [''] * len(a), 0)
print(count) #96

通常,当您需要枚举某一组值的所有可能性时,回溯是一种方法。回溯的方案总是相同的,在使用它来排列单词之后,分隔符会重复回溯。你知道吗


编辑

问题的第二部分,被描述为寻找分隔符的组合,实际上是寻找所有重复的处置的问题。这样做比我想象的要简单:在从seps中选择一个分隔符之后,您不需要删除它(在本例中禁用它),而只需要保留它。你知道吗

相关问题 更多 >