Python：如何截断字符串中超过两个相同字符的序列

6 投票

5 回答

4796 浏览

提问于 2025-04-16 07:36

我在寻找一种高效的方法来处理字符串，使得所有连续出现超过两个相同字符的部分，只保留前两个字符。

以下是一些输入和输出的例子：

hellooooooooo -> helloo
woooohhooooo -> woohhoo

我现在是通过循环字符来实现的，但速度有点慢。有没有人有其他的解决方案（比如正则表达式或者其他方法）

编辑：当前的代码：

word_new = ""
        for i in range(0,len(word)-2):    
            if not word[i] == word[i+1] == word[i+2]:
                word_new = word_new+word[i]
        for i in range(len(word)-2,len(word)):
            word_new = word_new + word[i]

正则表达式字符串处理高效算法循环优化字符串截断

5 个回答

这里也使用了正则表达式，不过没有用到函数：

import re

expr = r'(.)\1{3,}'
replace_by = r'\1\1'

mystr1 = 'hellooooooo'
print re.sub(expr, replace_by, mystr1)

mystr2 = 'woooohhooooo'
print re.sub(expr, replace_by, mystr2)

回答于 2025-04-16 由 Python大师

分享举报

下面的代码（和其他基于正则表达式的答案不同）正好实现了你想要的功能：把所有超过两个相同字符的连续出现替换成两个相同的字符。

>>> import re
>>> text = 'the numberr offf\n\n\n\ntheeee beast is 666 ...'
>>> pattern = r'(.)\1{2,}'
>>> repl = r'\1\1'
>>> re.sub(pattern, repl, text, flags=re.DOTALL)
'the numberr off\n\nthee beast is 66 ..'
>>>

你可能不想把这种处理应用到某些字符上，比如数字、标点符号、空格、制表符、换行符等等。在这种情况下，你需要把.替换成一个更严格的子模式。

比如说：

ASCII字母：[A-Za-z]

任何字母，具体取决于地区设置：可以用[^\W\d_]，并结合re.LOCALE标志来实现。

回答于 2025-04-16 由 Python大师

分享举报

编辑：根据有用的评论进行了修改

import re

def ReplaceThreeOrMore(s):
    # pattern to look for three or more repetitions of any character, including
    # newlines.
    pattern = re.compile(r"(.)\1{2,}", re.DOTALL) 
    return pattern.sub(r"\1\1", s)

(原始回复在这里) 试试这样的写法：

import re

# look for a character followed by at least one repetition of itself.
pattern = re.compile(r"(\w)\1+")

# a function to perform the substitution we need:
def repl(matchObj):
   char = matchObj.group(1)
   return "%s%s" % (char, char)

>>> pattern.sub(repl, "Foooooooooootball")
'Football'

回答于 2025-04-16 由 Python大师

分享举报

Python：如何截断字符串中超过两个相同字符的序列

5 个回答

撰写回答