正则表达式从字符串中提取(1个元音,1个常量)单词

2024-06-16 12:59:09 发布

您现在位置:Python中文网/ 问答频道 /正文

注意:我正在研究regex,我知道regex在这种情况下并不是最好的解决方案,但我仍然对如何以及是否可能实现它感兴趣。在

任务:

You are given a block of text with different words. These words are separated by white-spaces and punctuation marks. Numbers are not considered words in this mission (a mix of letters and digits is not a word either). You should count the number of words (striped words) where the vowels with consonants are alternating, that is; words that you count cannot have two consecutive vowels or consonants. The words consisting of a single letter are not striped -- do not count those. Casing is not significant for this mission.

Input: A text as a string (unicode)

Output: A quantity of striped words as an integer.

示例: string1=“狗,猫,老鼠,鸟。人“”应返回3。在


Tags: andofthetextyouiscountwith
1条回答
网友
1楼 · 发布于 2024-06-16 12:59:09

嗯,我认为有两种主要的方法,首先检查元音和辅音的交替。

例如,要检查ab的替代,可以使用如下方法:

a(?:ba)+|b(?:ab)+

将其扩展到元音/辅音,可以得到一个相当长的regex:

^{pr2}$

regex101 demo

两个否定lookaround (?<![a-z])和{}充当单词边界,确保检查整个单词。

[aeiou]代表元音,[^P{L}aeiou]代表辅音。它相当于[b-df-hj-np-tv-z]

第二种方法是确保单词中没有连续的元音或辅音。这一次需要另一次消极的展望,但时间要短一点:

(?<![a-z])(?:(?![aeiou]{2}|[^\P{L}aeiou]{2})[a-z])+(?![a-z])

regex101 demo

您可以将re.findallre.I一起使用(或在regex的开头使用(?i))来获得所有匹配项,然后获取列表的长度以获得“striped words”的计数。

(?<![a-z])              # Ensure no letter before
  (?:
    (?!
      [aeiou]{2}        # Ensure no two consecutive vowel
    |
      [^\P{L}aeiou]{2}  # Ensure no two consecutive consonant
    )
    [a-z]               # Any letter
  )+
(?![a-z])               # Ensure no more letters

相关问题 更多 >