处理括号的正则表达式

2024-05-16 18:42:38 发布

您现在位置:Python中文网/ 问答频道 /正文

我有多个字符串,比如

string1 = """[[拱|{{{#!html}}}]][br]팔짱낄 공''':'''"""
string2 = """[[顆|{{{#!html}}}]][br]낟알 과'''-'''[* some annotation that may include quote marks(', ") and brackets("(", ")", "[[", "]]").]""" 
string3 = """[[廓|{{{#!html}}}]][br]둘레 곽[br]클 확[* another annotation.][* another annotation.]"""
strings = [string1, string2, string3]

每个字符串都包含一个或多个“[br]”

每个字符串可能包含注释,也可能不包含注释

每个注释都以“[*”开头,以“]”结尾。它可能包括双括号(“[[”和“]]”),但绝不包括单括号(“[”和“]”),因此不会出现任何混淆(例如,[*某些注释带有[[括号]])

我要替换的单词是第一个“[br]”和注释之间的单词(如果有,则为字符串末尾),它们是

word1 = """팔짱낄 공''':'''"""
word2 = """낟알 과'''-'''"""
word3 = """둘레 곽[br]클 확"""

所以我试过了

for string in strings:
    print(re.sub(r"\[br\](.)+?(\[\*)+", "AAAA", string))

期待像这样的事情

[[拱|{{{#!html}}}]][br]AAAA
[[顆|{{{#!html}}}]][br]AAAA[* some annotation that may include quote marks(', ") and brackets("(", ")", "[[", "]]").]
[[廓|{{{#!html}}}]][br]AAAA[* another annotation.][* another annotation.]

正则表达式的逻辑是

\[br\]:第一个“[br]”

(.)+?:我要替换的一个或多个字符,lazy

(\[\*)+:一个或多个“[*”s

但结果是

[[拱|{{{#!html}}}]][br]팔짱낄 공''':'''
[[顆|{{{#!html}}}]]AAAA some annotation that may include quote marks(', ") and brackets("(", ")", "[[", "]]").]
[[廓|{{{#!html}}}]]AAAA another annotation.][* another annotation.]

相反,我也尝试了r"\[br\](.)+?(\[\*)*",但仍然不起作用。我如何解决这个问题


Tags: and字符串brthatincludehtmlanothersome
2条回答

我能想到的最好办法是首先检查是否有任何注释:

import re
r = re.compile(r'''
    (\[br])      
    (.*?)
    (\[\*.*\]$)
''', re.VERBOSE)

annotation = re.compile(r'''
    (\[\*.*]$)
''', re.VERBOSE)

def replace(m):
    return m.group(1) + "AAAA" + m.group(3)

for s in string1, string2, string3:
    print()
    print(s)
    if annotation.search(s):
        print(r.sub(replace, s))
    else:
        print(re.sub(r'\[br](.*)', '[br]AAAA', s))

它给出了预期的输出:

[[拱|{{{#!html}}}]][br]팔짱낄 공''':'''
[[拱|{{{#!html}}}]][br]AAAA

[[顆|{{{#!html}}}]][br]낟알 과'''-'''[* some annotation that may include quote marks(', ") and brackets("(", ")", "[[", "]]").]
[[顆|{{{#!html}}}]][br]AAAA[* some annotation that may include quote marks(', ") and brackets("(", ")", "[[", "]]").]

[[廓|{{{#!html}}}]][br]둘레 곽[br]클 확[* another annotation.][* another annotation.]
[[廓|{{{#!html}}}]][br]AAAA[* another annotation.][* another annotation.]

我想您可以将if移到replace函数中,但我不确定这是否会有很大的改进。它看起来像:

import re
r = re.compile(r'''
    ^(?P<prefix>.*)
    (?P<br>\[br].*?)
    (?P<annotation>\[\*.*\])?
    (?P<rest>[^\[]*)$
''', re.VERBOSE)

def replace(m):
    g = m.groupdict()
    if g['annotation'] is None:
        return g['prefix'] + "[br]AAAA" + g['rest']
    # the prefix will contain all but the last [br], thus the split...
    return g['prefix'].split('[br]')[0] + "[br]AAAA" + g['annotation'] + g['rest']

for s in string1, string2, string3:
    print()
    print(s)
    print(r.sub(replace, s))

你可以用

^(.*?\[br]).+?(?=\[\*.*?](?<!].)(?!])|$)

模式匹配

  • ^字符串的开头
  • (.*?\[br])捕获组1,匹配尽可能少的字符,直到第一次出现[br]
  • .+?匹配任意字符1+次
  • (?=正向前瞻,在右侧断言
    • \[\*.*?](?<!].)(?!])匹配[*直到]不被]包围
    • |
    • $断言字符串的结尾
  • )关闭前瞻

替换为捕获组1和AAAA类似\1AAAA

Regex demoPython demo

示例代码

import re

pattern = r"^(.*?\[br]).+?(?=\[\*.*?](?<!].)(?!])|$)"

s = ("[[拱|{{{#!html}}}]][br]팔짱낄 공''':'''\n"
            "[[顆|{{{#!html}}}]][br]낟알 과'''-'''[* some annotation that may include quote marks(', \") and brackets(\"(\", \")\", \"[[\", \"]]\").]\n"
            "[[廓|{{{#!html}}}]][br]둘레 곽[br]클 확[* another annotation.][* another annotation.]")

subst = "$1AAAA"
result = re.sub(pattern, r"\1AAAA", s, 0, re.MULTILINE)
print(result)

输出

[[拱|{{{#!html}}}]][br]AAAA
[[顆|{{{#!html}}}]][br]AAAA[* some annotation that may include quote marks(', ") and brackets("(", ")", "[[", "]]").]
[[廓|{{{#!html}}}]][br]AAAA[* another annotation.][* another annotation.]

相关问题 更多 >