特定项前的负正则表达式

2条回答

网友

1楼 · 编辑于 2024-04-20 07:20:11

这里的诀窍是使用一个regex，从行首开始匹配，因为这允许我们检查要匹配的单词前面是否有注释：

^([^%\n]*?)(?<!\\section{)(?<!\\paragraph{)\b(Astah)\b

需要多行标志m。此正则表达式的出现将替换为\1\\gloss{\2}。你知道吗

网友

2楼 · 编辑于 2024-04-20 07:20:11

这是我的两分钱：

首先，我们需要使用regex module by Matthew Barnett。它带来了许多有趣的特性。在这种情况下，它的一个特性可能很有用，即添加的(*SKIP)和(*FAIL)。你知道吗

从documentation：

Added (*PRUNE), (*SKIP) and (*FAIL) (Hg issue 153)
(*PRUNE) discards the backtracking info up to that point. When used in an atomic group or a lookaround, it won’t affect the enclosing pattern.
(*SKIP) is similar to (*PRUNE), except that it also sets where in the text the next attempt to match will start. When used in an atomic group or a lookaround, it won’t affect the enclosing pattern.
(*FAIL) causes immediate backtracking. (*F) is a permitted abbreviation.

因此，让我们构建模式并使用regex模块进行测试：

import regex

pattern = regex.compile(r'%.*(*SKIP)(*FAIL)|\\section{.*}(*SKIP)(*FAIL)|(Astah|UML|use case)')

s = """
    \section{Astah}
    Astah is a UML diagramming tool... bla bla...
    % use case:
    A use case is a...
"""


print regex.sub(pattern, r'\\gloss{\1}', s)

输出：

\section{Astah}
\gloss{Astah} is a \gloss{UML} diagramming tool... bla bla...
% use case:
A \gloss{use case} is a...

模式：

这句话很好地说明了这一点：

the trick is to match the various contexts we don't want so as to "neutralize them".

在左边，我们将写下我们不想要的上下文。在右边（最后一部分），我们捕捉到我们真正想要的东西。所以所有的上下文都被一个交替符号|分开，最后一个（我们想要的）被捕获。你知道吗

因为在这种情况下，我们将进行更换，所以我们需要（*跳过）（*失败）以保持我们不想更换的匹配零件完好无损。你知道吗

模式的含义：

%.*(*SKIP)(*FAIL)|\\section{.*}(*SKIP)(*FAIL)|(Astah|UML|use case)

%.*(*SKIP)(*FAIL)              # Matches the pattern but skip and fail
|                              # or
\\section{.*}(*SKIP)(*FAIL)    # Matches the pattern but skip and fail
|                              # or
(Astah|UML|use case)           # Matches the pattern and capture it.

这个简单的技巧在RexEgg上更为详细。你知道吗

希望有帮助。你知道吗

模式：

相关问题更多 >

编程相关推荐

热门问题

热门文章

特定项前的负正则表达式

模式：

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >