python中使用正则表达式的单词压缩函数

>>> sentence = "But the third reason Americans should care about Europe is more important even than the risk of a renewed financial crisis." >>> regexp = r'^[AEIOUaeiou]+|[AEIOUaeiou]+$|[^AEIOUaeiou]' >>> def compress(word): ... pieces = re.findall(regexp, word) ... return ''.join(pieces) >>> compress(sentence) 'Bt th thrd rsn mrcns shld cr bt rp s mr mprtnt vn thn th rsk f rnwd fnncl crss.'

2条回答

网友

1楼 · 编辑于 2024-04-18 22:01:31

'^[AEIOUaeiou]+'只允许匹配字符串开头的连续元音

'[AEIOUaeiou]+$'只允许匹配字符串末尾的连续元音

'[^AEIOUaeiou]'只允许匹配不是元音的字符

如果它是'[^AEIOUaeiou]+'，它将允许匹配任何连续的非元音字符

目前，使用正则表达式的模式，在所使用的句子中一次只捕获一个非元音字符。在

你的评论解释了你想做什么。
没有必要使用正则表达式来实现这一点；我认为用正则表达式解决问题更困难，或者至少更复杂

这能满足你的需要吗？公司名称：

def compress(word):
    if len(word)<3:
        yield word
    else:
        yield word[0]
        for c in word[1:-1]:
            if c not in 'AEIOUaeiou':
                yield c
        yield word[-1]


print ' '.join(''.join(compress(word)) for word in sentence.split())

网友

2楼 · 编辑于 2024-04-18 22:01:31

^和$锚定到整个字符串的结尾，因此您不是锚定到每个单词的开头和结尾，而是锚定到整个句子的开头和结尾。当句子中只有“关于”这个词时，它就如你所料。我想您应该锚定到单词边界（\b）。在

http://www.regular-expressions.info/wordboundaries.html

这可能会实现您想要的效果：

regexp = r'\b[AEIOUaeiou]+|[AEIOUaeiou]+\b|[^AEIOUaeiou]'

相关问题更多 >

编程相关推荐

热门问题

热门文章