Python正则表达式问题：剥离多行注释，但维护一行b

comments_test = "hello // comment\n"+\ "line 2 /* a comment */\n"+\ "line 3 /* a comment*/ /*comment*/\n"+\ "line 4 /* a comment\n"+\ "continuation of a comment*/ line 5\n"+\ "/* comment */line 6\n"+\ "line 7 /*********\n"+\ "********************\n"+\ "**************/\n"+\ "line ?? /*********\n"+\ "********************\n"+\ "********************\n"+\ "********************\n"+\ "********************\n"+\ "**************/\n"+\ "line ??"

3条回答

网友

1楼 · 编辑于 2024-05-16 14:10:25

comment_re = re.compile(
    r'(^)?[^\S\n]*/(?:\*(.*?)\*/[^\S\n]*|/[^\n]*)($)?',
    re.DOTALL | re.MULTILINE
)

def comment_replacer(match):
    start,mid,end = match.group(1,2,3)
    if mid is None:
        # single line comment
        return ''
    elif start is not None or end is not None:
        # multi line comment at start or end of a line
        return ''
    elif '\n' in mid:
        # multi line comment with line break
        return '\n'
    else:
        # multi line comment without line break
        return ' '

def remove_comments(text):
    return comment_re.sub(comment_replacer, text)

(^)?如果注释从行的开头开始，只要使用MULTILINE-标志，则将匹配。
[^\S\n]将匹配除换行符以外的任何空白字符。如果评论从自己的行开始，我们不想匹配换行符。
/\*(.*?)\*/将匹配多行注释并捕获内容。延迟匹配，因此我们不匹配两个或多个注释。DOTALL-标志使.匹配新行。
//[^\n]将匹配单行注释。由于DOTALL标志，无法使用.。
^只要使用MULTILINE标志，如果注释停在行的末尾，则{}将匹配。

示例：

>>> s = ("qwe /* 123\n"
         "456\n"
         "789 */ asd /* 123 */ zxc\n"
         "rty // fgh\n")
>>> print '"' + '"\n"'.join(
...     remove_comments(s).splitlines()
... ) + '"'
"qwe"
"asd zxc"
"rty"
>>> comments_test = ("hello // comment\n"
...                  "line 2 /* a comment */\n"
...                  "line 3 /* a comment*/ /*comment*/\n"
...                  "line 4 /* a comment\n"
...                  "continuation of a comment*/ line 5\n"
...                  "/* comment */line 6\n"
...                  "line 7 /*********\n"
...                  "********************\n"
...                  "**************/\n"
...                  "line ?? /*********\n"
...                  "********************\n"
...                  "********************\n"
...                  "********************\n"
...                  "********************\n"
...                  "**************/\n")
>>> print '"' + '"\n"'.join(
...     remove_comments(comments_test).splitlines()
... ) + '"'
"hello"
"line 2"
"line 3 "
"line 4"
"line 5"
"line 6"
"line 7"
"line ??"
"line ??"

编辑：

更新为新规范。
添加了另一个示例。

网友

2楼 · 编辑于 2024-05-16 14:10:25

这就是你要找的吗？

>>> print(s)
qwe /* 123
456
789 */ asd
>>> print(re.sub(r'\s*/\*.*\n.*\*/\s*', '\n', s, flags=re.S))
qwe
asd

这只适用于那些不止一行的评论，而不会影响其他评论。

网友

3楼 · 编辑于 2024-05-16 14:10:25

事实上，你甚至不得不问这个问题，并且给出的解决方案，我们可以说，不完全可读：-）应该是一个很好的迹象，表明REs不是这个问题的真正答案。

从可读性的角度来看，您最好将其实际编码为一个相对简单的解析器。

很多时候，人们试图用REs来表现“聪明”（我不是说用贬义的方式），认为一句台词是优雅的，但他们最终得到的只是一堆无法弥补的字符。我宁愿有一个完整的评论20行解决方案，我可以在一瞬间理解。

相关问题更多 >

编程相关推荐

热门问题

热门文章