Python - 如何找到两个字符串的所有交集？

7 投票

6 回答

10572 浏览

提问于 2025-04-17 03:12

如何找到两个字符串之间的所有交集（也叫最长公共子串）以及它们在两个字符串中的位置呢？

举个例子，如果 S1="never" 和 S2="forever"，那么它们的交集应该是 ["ever"]，而这个子串在两个字符串中的位置是 [(1,3)]。再比如，如果 S1="address" 和 S2="oddness"，那么它们的交集就是 ["dd","ess"]，对应的位置是 [(1,1),(4,4)]。

最好是找一个不需要用到任何库的最简解决方案，不过任何正确的解决方法也都可以。

字符串处理字符串匹配算法最长公共子串字符串交集

6 个回答

这是我想到的内容：

import itertools

def longest_common_substring(s1, s2):
   set1 = set(s1[begin:end] for (begin, end) in
              itertools.combinations(range(len(s1)+1), 2))
   set2 = set(s2[begin:end] for (begin, end) in
              itertools.combinations(range(len(s2)+1), 2))
   common = set1.intersection(set2)
   maximal = [com for com in common
              if sum((s.find(com) for s in common)) == -1 * (len(common)-1)]
   return [(s, s1.index(s), s2.index(s)) for s in maximal]

检查一些数值：

>>> longest_common_substring('address', 'oddness')
[('dd', 1, 1), ('ess', 4, 4)]
>>> longest_common_substring('never', 'forever')
[('ever', 1, 3)]
>>> longest_common_substring('call', 'wall')
[('all', 1, 1)]
>>> longest_common_substring('abcd1234', '1234abcd')
[('abcd', 0, 4), ('1234', 4, 0)]

回答于 2025-04-17 由 Python大师

分享举报

这个操作可以在 O(n+m) 的时间内完成，其中 n 和 m 分别是输入字符串的长度。

伪代码如下：

function LCSubstr(S[1..m], T[1..n])
    L := array(1..m, 1..n)
    z := 0
    ret := {}
    for i := 1..m
        for j := 1..n
            if S[i] = T[j]
                if i = 1 or j = 1
                    L[i,j] := 1
                else
                    L[i,j] := L[i-1,j-1] + 1
                if L[i,j] > z
                    z := L[i,j]
                    ret := {}
                if L[i,j] = z
                    ret := ret ∪ {S[i-z+1..z]}
    return ret

想了解更多细节，可以查看最长公共子串问题的维基百科文章。

回答于 2025-04-17 由 Python大师

分享举报

你说你不能使用任何库。不过，Python自带的difflib库里有一个函数，正好可以满足你的需求。考虑到这是一个Python面试问题，面试官可能希望你对difflib有所了解。

In [31]: import difflib

In [32]: difflib.SequenceMatcher(None, "never", "forever").get_matching_blocks()
Out[32]: [Match(a=1, b=3, size=4), Match(a=5, b=7, size=0)]


In [33]: difflib.SequenceMatcher(None, "address", "oddness").get_matching_blocks()
Out[33]: [Match(a=1, b=1, size=2), Match(a=4, b=4, size=3), Match(a=7, b=7, size=0)]

你可以忽略最后一个Match元组，因为它是个虚假的（根据文档的说法）。

回答于 2025-04-17 由 Python大师

分享举报

Python - 如何找到两个字符串的所有交集？

6 个回答

撰写回答