用以大写字母开头的单词之间的下划线替换空格的正则表达式语句

2024-06-16 13:13:06 发布

您现在位置:Python中文网/ 问答频道 /正文

输入如下:

Roger Federer is a tennis player. Rafael Nadal Parera is also a tennis player. Another legend player is Novak Djokovic.

我期望得到如下输出:

Roger_Federer is a tennis player. Rafael_Nadal_Parera is also a tennis player. Another legend player is Novak_Djokovic.

我尝试使用正向查找(使用Python re包)的一个解决方案是:

re.sub(r"(?<=\w)\s([A-Z])", r"_\1", above_string)

但是在这里,由于\w,我得到了一个输出:

Roger_Federer is a tennis player. Rafael_Nadal_Parera is also a tennis player. Another legend player is_Novak_Djokovic.

当然,我无法使用r"(?<=[A-Z]\w*)\s([A-Z])"使其工作,因为

error: look-behind requires fixed-width pattern

我必须将这个正则表达式应用于大量(而且非常多样化)的文章,所以我负担不起任何循环或str.replace暴力。我想知道是否有人能提供一个有效的解决方案


Tags: reisanother解决方案alsoplayerlegendroger
1条回答
网友
1楼 · 发布于 2024-06-16 13:13:06

如果不关心所有Unicode大写字母,可以使用

import re
above_string = "Roger Federer is a tennis player. Rafael Nadal Parera is also a tennis player. Another legend player is Novak Djokovic."
print( re.sub(r"\b([A-Z]\w*)\s+(?=[A-Z])", r"\1_", above_string) )
# => Roger_Federer is a tennis player. Rafael_Nadal_Parera is also a tennis player. Another legend player is Novak_Djokovic.

Python demo。见regex demo详细信息

  • \b-单词边界
  • ([A-Z]\w*)-group1(\1):一个大写字母和零个或多个单词字符
  • \s+-一个或多个空格
  • (?=[A-Z])-与紧跟大写字母的位置匹配的正向前瞻

如果需要支持所有Unicode字母,建议使用pip install regex

import regex
above_string = "Roger Federer is a tennis player. Rafael Nadal Parera is also a tennis player. Another legend player is Novak Djokovic."
print( regex.sub(r"\b(\p{Lu}\w*)\s+(?=\p{Lu})", r"\1_", above_string) )
# => Roger_Federer is a tennis player. Rafael_Nadal_Parera is also a tennis player. Another legend player is Novak_Djokovic.

this Python demo。这里,\p{Lu}匹配任何Unicode大写字母

相关问题 更多 >