使用Python正则表达式删除匹配的括号
我有一个Latex文件,里面有很多文字是用\red{}
标记的,但在\red{}
里面可能还会有括号,比如\red{here is \underline{underlined} text}
。我想去掉红色的标记,经过一些搜索,我写了这个Python脚本:
import os, re, sys
#Start program in terminal with
#python RedRemover.py filename
#sys.argv[1] then has the value filename
ifn = sys.argv[1]
#Open file and read it
f = open(ifn, "r")
c = f.read()
#The whole file content is now stored in the string c
#Remove occurences of \red{...} in c
c=re.sub(r'\\red\{(?:[^\}|]*\|)?([^\}|]*)\}', r'\1', c)
#Write c into new file
Nf=open("RedRemoved_"+ifn,"w")
Nf.write(c)
f.close()
Nf.close()
但是这个脚本会把
\red{here is \underline{underlined} text}
转换成
here is \underline{underlined text}
这不是我想要的。我想要的是
here is \underline{underlined} text
2 个回答
1
我觉得你需要保留大括号,看看这个例子:\red{\bf test}
:
import re
c = r'\red{here is \underline{underlined} text} and \red{more}'
d = c
# this may be less painful and sufficient, and even more correct
c = re.sub(r'\\red\b', r'', c)
print "1ST:", c
# if you want to get rid of the curlies:
d = re.sub(r'\\red{([^{]*(?:{[^}]*}[^}]*)*)}', r'\1', d)
print "2ND:", d
结果是:
1ST: {here is \underline{underlined} text} and {more}
2ND: here is \underline{underlined} text and more
6
你无法用re模块来匹配不确定层数的嵌套括号,因为它不支持递归。要解决这个问题,你可以使用新的regex模块:
import regex
c = r'\red{here is \underline{underlined} text}'
c = regex.sub(r'\\red({((?>[^{}]+|(?1))*)})', r'\2', c)
这里的(?1)
是对捕获组1的递归调用。