Python在正则表达式匹配中找到多个正则表达式匹配

<font style = "font-family:inherit> <any other HTML tags> random text <table cellpadding="0" cellspacing="0" style="font-family:times new roman;font-size:10pt;width:100%;border-collapse:collapse;text-align:left;"> <tr> <td colspan="3"> <font style="font-family:inherit;font-size:12pt;font- weight:bold;">washington, d.c. 20549</font> random text <any other HTML tags within table tags> </td> </table> random text <font style = "font-family:inherit>

<font style = "font-family:inherit> <any other HTML tags> random text {table cellpadding="0" cellspacing="0" style="font-family:times new roman;font-size:10pt;width:100%;border-collapse:collapse;text-align:left;"} {tr} {td colspan="3"} {font style="font-family:inherit;font-size:12pt;font- weight:bold;"}washington, d.c. 20549{/font} random text {any other HTML tags within table tags} {/td} {/table} random text <font style = "font-family:inherit>

3条回答

网友

1楼 · 编辑于 2024-04-16 05:54:42

别对自己太苛刻。我不确定用标准的re-sub一枪到底有没有可能。事实上，我认为这要么是不可能的，要么是非常复杂的。例如replace中的自定义函数（您可以在自定义函数中填充许多自定义函数，直到整个html解析器）

相反，我强烈建议一个简单的解决方案是使用split/join进行拆分和重新组装，或者，可能是，您将确定一系列的重新替换。你知道吗

假设一个表l = s.split('table>'); l = [1]将为您提供表内容和l.split（。下面是一个多任务版本

def curlyfy_el(s, tag='table'):

    return ('{%s' % tag).join(
                        [ ('{/%s}' % tag).join(
                                   [y if i != 0 else y.replace("<",  "{").replace(">", "}")
                                 for i, y in enumerate(x.split( '</%s>' % tag, 1)) 
    for x in s.split('<%s' % tag) ])

略显易读

def curlyfy_el(s, tag='table'):
    h, *t = s.split('<%s' % tag)  # stplit on some pretable text and fragments starting with table
    r = [h]
    for x in t:
        head, *tail = x.split('</%s>' % tag, 1)  # select table body and rest, 1 is to keep duplicate closure of tag in one str
        head = head.replace("<", "{")
        head = head.replace(">", "}")
        r.append( ('{/%s}' % tag).join([head, *tail]))
    return ('{/%s}' % tag).join(r)

一般来说，为了处理HTML，最好使用一些指定的解析库，比如BeautifulSoup，这种特殊代码在很多情况下都会失败。你知道吗

网友

2楼 · 编辑于 2024-04-16 05:54:42

正如Serge提到的，这并不是一个真正需要用单个正则表达式解决的问题，而是多个正则表达式和一些python魔术：

def replacer(match):  # re.sub can take a function as the repl argument which gives you more flexibility
    choices = {'<':'{', '>':'}'}  # replace < with { and > with }
    return choices[match.group(0)]

result = []  # store the results here
for text in re.split(r'(?s)(?=<table)(.*)(?<=table>)', your_text): # split your text into table parts and non table parts
    if text.startswith('<table'): # if this is a table part, do the <> replacement 
        result.append(re.sub(r'[<>]', replacer, text))
    else: # otherwise leave it the same
        result.append(text)
print(''.join(result)) # join the list of strings to get the final result

查看文档，了解如何为repl参数re.subhere使用函数

以及正则表达式的解释：

(?s)        # the . matches newlines 
(?=<table)  # positive look-ahead matching '<table'
(.*)        # matches everything between <table and table> (it is inclusive because of the look-ahead/behinds)   
(?<=table>) # positive look-behind matching 'table>'

还要注意，因为(.*)在捕获组中，所以它包含在re.split输出的字符串中（请参见here）

网友

3楼 · 编辑于 2024-04-16 05:54:42

可以使用以下正则表达式进行匹配，然后替换为Group 1：

[\s\S]*(<table[\s\S]*?</table>)[\s\S]*

这将匹配'<table'之前的任何内容，然后用表内容创建一个Group 1，然后匹配之后的所有内容。你知道吗

替换为：

$1

那只会给你一张有内容的桌子。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章