wikitext temp上的Python regex

1条回答

网友

1楼 · 发布于 2024-04-25 07:56:45

您需要为.打开换行符匹配；它不匹配换行符，否则：

re.search(r'\{cite web(.*?)([\r\n]+)(.*?)\}\}', inputtext, flags=re.DOTALL)

要匹配的文本中有多个换行符，因此仅匹配一组连续的换行符是不够的。你知道吗

从^{} documentation：

Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline.

您可以使用一个re.sub()调用一次性删除cite节中的所有换行，而不使用循环：

re.sub(r'\{cite web.*?[\r\n]+.*?\}\}', lambda m: re.sub('\s*[\r\n]\s*', '', m.group(0)), inputtext, flags=re.DOTALL)

这使用一个嵌套的正则表达式从匹配的文本中删除所有包含至少一个换行符的空白。你知道吗

演示：

>>> import re
>>> inputtext = '''\
... {{cite web
... |title=Testing
... |url=Testing
... |editor=Testing
... }}
... '''
>>> re.search(r'\{cite web(.*?)([\r\n]+)(.*?)\}\}', inputtext, flags=re.DOTALL)
<_sre.SRE_Match object at 0x10f335458>
>>> re.sub(r'\{cite web.*?[\r\n]+.*?\}\}', lambda m: re.sub('\s*[\r\n]\s*', '', m.group(0)), inputtext, flags=re.DOTALL)
'{{cite web|title=Testing|url=Testing|editor=Testing}}\n'

相关问题更多 >

编程相关推荐

热门问题

热门文章

wikitext temp上的Python regex

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >