查找两个字符串之间的所有公共子字符串,而不考虑大小写和ord

2024-05-15 23:20:17 发布

您现在位置:Python中文网/ 问答频道 /正文

因此,我从这个问题的答案Function to find all common substrings in two strings not giving correct output开始编写代码,并通过将字符串转换为小写,对其进行了一些修改,以适应大小写独立性(即AbCd与AbCd相同,依此类推)。但是,对于像'ABCDXGHIJ''ghijYAbCd'这样的字符串,它只返回['ghij'],而不是所需的输出['ABCD', 'GHIJ']。你知道吗

以下是其他示例:

  • 'Bonywasawarrior''Bonywasxwarrior'(输出:['Bonywas', 'warrior', 'wa'],所需输出:['Bonywas', 'warrior']
  • '01101001''101010'(输出:['1010', '0', '1010', '01', '10', '01'],所需输出:['1010']

这是我的密码:

t = int(input()) #t cases

while t > 0:
    A = str(input()) #1st string
    B = str(input()) #2nd string

    low_A = A.lower()
    low_B = B.lower()

    answer = ""
    anslist=[]
    for i in range(len(A)):
        common = ""
        for j in range(len(B)):
            if (i + j < len(A) and low_A[i + j] == low_B[j]):
                common += B[j]
            else:
                #if (len(common) > len(answer)): 
                answer = common
                if answer != '' and len(answer) > 1:
                    anslist.append(answer)
                common = ""

        if common != '':
            anslist.append(common)

    if len(anslist) == 0:
        print('[]') #print if no common substring
    else:
        print(anslist)
    t -= 1

Tags: 字符串answerininputstringlenifcommon
2条回答

这是Finding all the common substrings of given two strings的一个副本,它提供了一个Java解决方案,为此我尽了最大努力将其转换为Python,并对其进行了“增强”,使其不区分大小写:

def find_common(s, t):
    table = [len(t)*[0] for i in range(len(s))]
    longest = 0
    result = set()
    for i, ch1 in enumerate(s.lower()):
        for j, ch2 in enumerate(t.lower()):
            if ch1 != ch2:
                continue
            table[i][j] = 1 if i == 0 or j == 0 else 1 + table[i - 1][j - 1]
            if table[i][j] > longest:
                longest = table[i][j]
                result.clear()
            if table[i][j] == longest:
                result.add(s[i - longest + 1:i + 1]);
    return result


print(find_common('Bonywasawarrior', 'Bonywasxwarrior'))
print(find_common('01101001', '101010'))
print(find_common('ABCDXGHIJ', 'ghijYAbCd'))

印刷品:

{'Bonywas', 'warrior'}
{'1010'}
{'GHIJ', 'ABCD'}

您可以在while循环中增加一个偏移量,以使公共字符与相应索引的偏移量保持串联,直到它们变得不同。要查找最长、不重叠的公共子字符串,可以使用递归遍历子字符串分区的不同路径的函数,并返回子字符串长度最长的路径:

def common_strings(a, b, i=0, j=0):
    candidates = []
    len_a = len(a)
    len_b = len(b)
    if j == len_b:
        candidates.append(common_strings(a, b, i + 1, 0))
    elif i < len_a:
        offset = 0
        while i + offset < len_a and j + offset < len_b and a[i + offset].lower() == b[j + offset].lower():
            offset += 1
        if offset > 1:
            candidates.append([a[i: i + offset]] + common_strings(a, b, i + offset, j + offset))
        candidates.append(common_strings(a, b, i, j + 1))
    return candidates and max(candidates, key=lambda t: sorted(map(len, t), reverse=True))

以便:

print(common_strings('ABCDXGHIJ', 'ghijYAbCd'))
print(common_strings('Bonywasawarrior', 'Bonywasxwarrior'))
print(common_strings('01101001', '101010'))

输出:

['ABCD', 'GHIJ']
['Bonywas', 'warrior']
['1010']

相关问题 更多 >