Python中的非贪婪dotall正则表达式

0 投票

4 回答

1112 浏览

提问于 2025-04-18 15:28

我需要解析用PHP写的方法中的注释。我写了一个正则表达式（下面有个简化的例子）来查找这些注释，但它的效果并不如我所期待的。它不是匹配/**和*/之间最短的文本部分，而是匹配了最大量的源代码（之前带注释的方法）。我确定我使用的是正确的.*?这个非贪婪版本的*，而且我也没有发现DOTALL会关闭它。请问问题可能出在哪里呢？谢谢。

p = re.compile(r'(?:/\*\*.*?\*/)\n\s*public', re.DOTALL)
methods = p.findall(text)

正则表达式 php 非贪婪匹配注释解析 dotall模式

4 个回答

你可以使用这个：

\/\*\*([^*]|\*[^/])*?\*\/\s*public

这个规则会匹配任何不是星号 (*) 的符号。如果是星号的话，它后面不能跟斜杠。这意味着它只会捕捉到在“public”之前刚好关闭的注释，而不会更早。

举个例子：http://regexr.com/398b3

解释可以看这里：http://tinyurl.com/lcewdmo

注意：如果注释里包含 */，这个方法就不管用了。

回答于 2025-04-18 由 Python大师

分享举报

在编程中，有时候我们会遇到一些问题，特别是在使用某些工具或库的时候。这些问题可能会让我们感到困惑，不知道该怎么解决。比如，有人可能在使用某个特定的功能时，发现它并没有按照预期的方式工作。这种情况下，通常需要仔细检查代码，看看是否有错误，或者是否遗漏了什么重要的步骤。

另外，很多时候，其他程序员可能也遇到过类似的问题，他们会在网上分享自己的经验和解决方案。通过查阅这些信息，我们可以更快地找到解决办法，避免重复犯错。

总之，编程的过程中，遇到问题是很正常的，关键是要保持耐心，仔细分析，寻找解决方案。

# Some examples and assuming that the annotation you want to parse
# starts with a /** and ends with a */.  This may be spread over
# several lines.

text = """
/**
 @Title(value='Welcome', lang='en')
 @Title(value='Wilkommen', lang='de')
 @Title(value='Vitajte', lang='sk')
 @Snippet
    ,*/
class WelcomeScreen {}

   /** @Target("method") */
  class Route extends Annotation {}

/** @Mapping(inheritance = @SingleTableInheritance,
    columns = {@ColumnMapping('id'), @ColumnMapping('name')}) */
public Person {}

"""

text2 = """ /** * comment */
CLASS MyClass extens Base {

/** * comment */
public function xyz
"""


import re

# Match a PHP annotation and the word following class or public
# function.
annotations = re.findall(r"""/\*\*             # Starting annotation
                                               # 
                            (?P<annote>.*?)    # Namned, non-greedy match
                                               # including newline
                                               #
                             \*/               # Ending annotation
                                               #
                             (?:.*?)           # Non-capturing non-greedy
                                               # including newline
                 (?:public[ ]+function|class)  # Match either
                                               # of these
                             [ ]+              # One or more spaces
                             (?P<name>\w+)     # Match a word
                         """,
                         text + text2,
                         re.VERBOSE | re.DOTALL | re.IGNORECASE)

for txt in annotations:
     print("Annotation: "," ".join(txt[0].split()))
     print("Name: ", txt[1])

回答于 2025-04-18 由 Python大师

分享举报

我想你是想要这个，

>>> text = """ /** * comment */ class MyClass extens Base { /** * comment */ public function xyz """
>>> m = re.findall(r'\/\*\*(?:(?!\*\/).)*\*\/\s*public', text, re.DOTALL)
>>> m
['/** * comment */ public']

如果你不想在最后的匹配结果中出现 public，那么可以使用下面这个正则表达式，它使用了正向前瞻，

>>> m = re.findall(r'\/\*\*(?:(?!\*\/).)*\*\/(?=\s*public)', text, re.DOTALL)
>>> m
['/** * comment */']

回答于 2025-04-18 由 Python大师

分享举报

正则表达式引擎是从左到右解析的。懒惰量词会尽量少匹配当前匹配位置的内容，但它不能把匹配的起始位置往前推，即使这样做可以减少匹配的文本量。这意味着，它不会从最后一个 /** 开始匹配到 public，而是会从第一个 /** 开始，一直匹配到下一个和 public 相关的 */。

如果你想在注释中排除 */，你需要用一个前瞻断言来把 . 分组：

(?:(?!\*/).)

这个 (?!\*/) 是在确认我们匹配的字符不是 */ 的开头。

回答于 2025-04-18 由 Python大师

分享举报

Python中的非贪婪dotall正则表达式

4 个回答

撰写回答