( # begin capture
[A-Z] # one uppercase letter \ First Word
[a-z]+ # 1+ lowercase letters /
(?=\s[A-Z]) # must have a space and uppercase letter following it
(?: # non-capturing group
\s # space
[A-Z] # uppercase letter \ Additional Word(s)
[a-z]+ # lowercase letter /
)+ # group can be repeated (more words)
) #end capture
$mystring = "the United States of America has many big cities like New York and Los Angeles, and others like Atlanta";
@phrases = $mystring =~ /[A-Z][\w'-]\*(?:\s+[A-Z][\w'-]\*)\*/g;
print "\n" . join(", ", @phrases) . "\n\n# phrases = " . scalar(@phrases) . "\n\n";
输出:
$ ./try_me.pl
United States, America, New York, Los Angeles, Atlanta
\# phrases = 5
这是因为
findall
返回正则表达式中的所有捕获组,并且有两个捕获组(一个获取所有匹配文本,另一个获取后续单词的内部捕获组)。您只需使用
(?:regex)
而不是(regex)
,就可以将第二个捕获组变成非捕获组:积极展望未来:
断言当前单词要被接受,需要后面跟着另一个带有大写字母的单词。分解:
输出:
相关问题 更多 >
编程相关推荐