如何检测字符串中的相同部分？

0 投票

4 回答

2220 浏览

数据工程师

提问于 2025-04-15 22:01

我试着把这个关于解码算法的提问拆分成更小的问题。这是第一部分。

问题：

有两个字符串：s1 和 s2
s1 的一部分和 s2 的一部分是相同的
空格是分隔符
怎么提取出相同的部分？

例子 1：

s1 = "12 November 2010 - 1 visitor"
s2 = "6 July 2010 - 100 visitors"

the identical parts are "2010", "-", "1" and "visitor"

例子 2：

s1 = "Welcome, John!"
s2 = "Welcome, Peter!"

the identical parts are "Welcome," and "!"

例子 3：（为了更清楚“！”的例子）

s1 = "Welcome, Sam!"
s2 = "Welcome, Tom!"

the identical parts are "Welcome," and "m!"

优先考虑 Python 和 Ruby。谢谢

数据结构文本处理字符串比较字符串匹配算法设计解码算法空格分隔符相同部分提取

4 个回答

s1 = "12 November 2010 - 1 visitor"
s2 = "6 July 2010 - 100 visitors"
l1 = s1.split()
for item in l1:
   if item in s2:
      print item

这个是根据空格来分割的。

如果想要根据单词的边界来分割（比如在例子2中抓住!），在Python中就不太行，因为re.split()无法处理零长度的匹配。

第三个例子中，甚至把单词的任何子串都当作可能的匹配，这就让事情变得复杂多了，因为可能的变化太多了（比如对于1234，我得检查1234、123、234、12、23、34、1、2、3和4，而且每增加一个数字，可能的组合数量就会成倍增加）。

回答于 2025-04-15 由 Python大师

分享举报

比如说，第一个例子

>>> s1 = 'November 2010 - 1 visitor'
>>> s2 = '6 July 2010 - 100 visitors'
>>> 
>>> [i for i in s1.split() if any(j for j in s2.split() if i in j)]
['2010', '-', '1', 'visitor']
>>>

对于两个例子

>>> s1 = "Welcome, John!"
>>> s2 = "Welcome, Peter!"
>>> [i for i in s1.replace('!',' !').split() if any(j for j in s2.replace('!',' !').split() if i in j)]
['Welcome,', '!']
>>>

注意: 上面的代码在第三个例子中是无法使用的，这个例子是提问者刚刚添加的

回答于 2025-04-15 由 Python大师

分享举报

编辑：更新了这个例子，使其适用于所有示例，包括第一个：

def scan(s1, s2):
    # Find the longest match where s1 starts with s2
    # Returns None if no matches
    l = len(s1)
    while 1:
        if not l:
            return None
        elif s1[:l] == s2[:l]:
            return s1[:l]
        else:
            l -= 1

def contains(s1, s2):
    D = {} # Remove duplicates using a dict
    L1 = s1.split(' ')
    L2 = s2.split(' ')

    # Don't add results which have already 
    # been processed to satisfy example #1!
    DProcessed = {}

    for x in L1:
        yy = 0
        for y in L2:
            if yy in DProcessed:
                yy += 1
                continue

            # Scan from the start to the end of the words
            a = scan(x, y)
            if a: 
                DProcessed[yy] = None
                D[a] = None
                break

            # Scan from the end to the start of the words
            a = scan(x[::-1], y[::-1])
            if a: 
                DProcessed[yy] = None
                D[a[::-1]] = None
                break
            yy += 1

    return list(D.keys())

print contains("12 November 2010 - 1 visitor",
               "6 July 2010 - 100 visitors")
print contains("Welcome, John!",
               "Welcome, Peter!")
print contains("Welcome, Sam!",
               "Welcome, Tom!")

输出结果是：

['1', 'visitor', '-', '2010']
['Welcome,', '!']
['Welcome,', 'm!']

回答于 2025-04-15 由 Python大师

分享举报

如何检测字符串中的相同部分？

4 个回答

撰写回答