正则表达式从字符串中删除“by”

# coding=utf8 # the above tag defines encoding for this document and is for Python 2.x compatibility import re regex = r"(?:by)? ([\w ]+)" test_str = ("\\n \\n by Ally Foster\\n \\n \n\n" "\\n \\n Ally Foster\\n \\n \n\n" "by name name\n\n" "name name") matches = re.finditer(regex, test_str, re.MULTILINE) for matchNum, match in enumerate(matches): matchNum = matchNum + 1 print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group())) for groupNum in range(0, len(match.groups())): groupNum = groupNum + 1 print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum))) # Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

2条回答

网友
                    
                    

                    

                    1楼 ·

                    
                        编辑于 2024-06-11 20:56:08

我建议使用
re.findall(r'\b(?!by\b)[^\W\d_]+(?: *(?:, *)?[^\W\d_]+)*', s)
参见regex demo。在python2中，需要传递re.U标志，以使所有速记字符类和单词边界都能识别Unicode。要匹配制表符而不仅仅是空格，请将空格替换为[ \t]。你知道吗
细节
\b-单词边界
(?!by\b)-下一个单词不能是by
[^\W\d_]+-一个或多个字母
(?: *(?:, *)?[^\W\d_]+)*-一个非捕获组，匹配0次或多次出现的：
 *-零个或多个空格
(?:, *)?—由,和0+空格组成的可选序列
[^\W\d_]+-一个或多个字母。你知道吗

网友
                    
                    

                    

                    2楼 ·

                    
                        编辑于 2024-06-11 20:56:08

(?:by )?(\b(?!by\b)[\w, ]+\S)
我的最终版本也不会选择字符串，只有by

`相关问题更多 >`

`编程相关推荐`

`热门问题`

`热门文章`