处理括号的正则表达式

string1 = """[[拱|{{{#!html}}}]][br]팔짱낄 공''':'''""" string2 = """[[顆|{{{#!html}}}]][br]낟알 과'''-'''[* some annotation that may include quote marks(', ") and brackets("(", ")", "[[", "]]").]""" string3 = """[[廓|{{{#!html}}}]][br]둘레 곽[br]클 확[* another annotation.][* another annotation.]""" strings = [string1, string2, string3]

[[拱|{{{#!html}}}]][br]AAAA [[顆|{{{#!html}}}]][br]AAAA[* some annotation that may include quote marks(', ") and brackets("(", ")", "[[", "]]").] [[廓|{{{#!html}}}]][br]AAAA[* another annotation.][* another annotation.]

[[拱|{{{#!html}}}]][br]팔짱낄 공''':''' [[顆|{{{#!html}}}]]AAAA some annotation that may include quote marks(', ") and brackets("(", ")", "[[", "]]").] [[廓|{{{#!html}}}]]AAAA another annotation.][* another annotation.]

2条回答

网友

1楼 · 编辑于 2024-05-16 18:42:38

我能想到的最好办法是首先检查是否有任何注释：

import re
r = re.compile(r'''
    (\[br])      
    (.*?)
    (\[\*.*\]$)
''', re.VERBOSE)

annotation = re.compile(r'''
    (\[\*.*]$)
''', re.VERBOSE)

def replace(m):
    return m.group(1) + "AAAA" + m.group(3)

for s in string1, string2, string3:
    print()
    print(s)
    if annotation.search(s):
        print(r.sub(replace, s))
    else:
        print(re.sub(r'\[br](.*)', '[br]AAAA', s))

它给出了预期的输出：

[[拱|{{{#!html}}}]][br]팔짱낄 공''':'''
[[拱|{{{#!html}}}]][br]AAAA

[[顆|{{{#!html}}}]][br]낟알 과'''-'''[* some annotation that may include quote marks(', ") and brackets("(", ")", "[[", "]]").]
[[顆|{{{#!html}}}]][br]AAAA[* some annotation that may include quote marks(', ") and brackets("(", ")", "[[", "]]").]

[[廓|{{{#!html}}}]][br]둘레 곽[br]클 확[* another annotation.][* another annotation.]
[[廓|{{{#!html}}}]][br]AAAA[* another annotation.][* another annotation.]

我想您可以将if移到replace函数中，但我不确定这是否会有很大的改进。它看起来像：

import re
r = re.compile(r'''
    ^(?P<prefix>.*)
    (?P<br>\[br].*?)
    (?P<annotation>\[\*.*\])?
    (?P<rest>[^\[]*)$
''', re.VERBOSE)

def replace(m):
    g = m.groupdict()
    if g['annotation'] is None:
        return g['prefix'] + "[br]AAAA" + g['rest']
    # the prefix will contain all but the last [br], thus the split...
    return g['prefix'].split('[br]')[0] + "[br]AAAA" + g['annotation'] + g['rest']

for s in string1, string2, string3:
    print()
    print(s)
    print(r.sub(replace, s))

网友

2楼 · 编辑于 2024-05-16 18:42:38

你可以用

^(.*?\[br]).+?(?=\[\*.*?](?<!].)(?!])|$)

模式匹配

^字符串的开头
(.*?\[br])捕获组1，匹配尽可能少的字符，直到第一次出现[br]
.+?匹配任意字符1+次
(?=正向前瞻，在右侧断言
- \[\*.*?](?<!].)(?!])匹配[*直到]不被]包围
- |或
- $断言字符串的结尾
)关闭前瞻

替换为捕获组1和AAAA类似\1AAAA

Regex demo Python demo

示例代码

import re

pattern = r"^(.*?\[br]).+?(?=\[\*.*?](?<!].)(?!])|$)"

s = ("[[拱|{{{#!html}}}]][br]팔짱낄 공''':'''\n"
            "[[顆|{{{#!html}}}]][br]낟알 과'''-'''[* some annotation that may include quote marks(', \") and brackets(\"(\", \")\", \"[[\", \"]]\").]\n"
            "[[廓|{{{#!html}}}]][br]둘레 곽[br]클 확[* another annotation.][* another annotation.]")

subst = "$1AAAA"
result = re.sub(pattern, r"\1AAAA", s, 0, re.MULTILINE)
print(result)

输出

[[拱|{{{#!html}}}]][br]AAAA
[[顆|{{{#!html}}}]][br]AAAA[* some annotation that may include quote marks(', ") and brackets("(", ")", "[[", "]]").]
[[廓|{{{#!html}}}]][br]AAAA[* another annotation.][* another annotation.]

相关问题更多 >

编程相关推荐

热门问题

热门文章