正则表达式从字符串中删除“by”

2024-06-11 20:56:08 发布

您现在位置:Python中文网/ 问答频道 /正文

更新2:https://regex101.com/r/bE5aWW/2

更新:这是我目前能想到的,https://regex101.com/r/bE5aWW/1/,但是需要帮助来摆脱.

Case 1

^{pr 1}$

Case 2

^{pr 2}$

Case 3

^{pr 3}$

Case 4

^{pr 4}$

I would like to select the name part from the above strings, i.e. name name。我想到的那个,(?:by)? ([\w ]+)by前面有空格时就不起作用了。你知道吗

谢谢

来自regex101的代码

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(?:by)? ([\w ]+)"

test_str = ("\\n                                \\n                                   by Ally Foster\\n                                \\n                            \n\n"
    "\\n                                \\n                                   Ally Foster\\n                                \\n                            \n\n"
    "by name name\n\n"
    "name name")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches):
    matchNum = matchNum + 1

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

Tags: andthetonameforbymatchgroup
2条回答

我建议使用

re.findall(r'\b(?!by\b)[^\W\d_]+(?: *(?:, *)?[^\W\d_]+)*', s)

参见regex demo。在python2中,需要传递re.U标志,以使所有速记字符类和单词边界都能识别Unicode。要匹配制表符而不仅仅是空格,请将空格替换为[ \t]。你知道吗

细节

  • \b-单词边界
  • (?!by\b)-下一个单词不能是by
  • [^\W\d_]+-一个或多个字母
  • (?: *(?:, *)?[^\W\d_]+)*-一个非捕获组,匹配0次或多次出现的:
    • *-零个或多个空格
    • (?:, *)?—由,和0+空格组成的可选序列
    • [^\W\d_]+-一个或多个字母。你知道吗
(?:by )?(\b(?!by\b)[\w, ]+\S)

我的最终版本也不会选择字符串,只有by

相关问题 更多 >