Python正则表达式嵌套搜索与替换
我需要在引号里的内容中查找并替换所有的逗号。
也就是说,
"thing1,blah","thing2,blah","thing3,blah",thing4
需要变成
"thing1\,blah","thing2\,blah","thing3\,blah",thing4
我的代码:
inFile = open(inFileName,'r')
inFileRl = inFile.readlines()
inFile.close()
p = re.compile(r'["]([^"]*)["]')
for line in inFileRl:
pg = p.search(line)
# found comment block
if pg:
q = re.compile(r'[^\\],')
# found comma within comment block
qg = q.search(pg.group(0))
if qg:
# Here I want to reconstitute the line and print it with the replaced text
#print re.sub(r'([^\\])\,',r'\1\,',pg.group(0))
我需要根据正则表达式筛选出我想要的列,然后进一步过滤,
接着进行正则替换,最后再把这一行重新组合起来。
我该怎么在Python中做到这一点呢?
5 个回答
0
你可以试试这个正则表达式。
>>> re.sub('(?<!"),(?!")', r"\\,",
'"thing1,blah","thing2,blah","thing3,blah",thing4')
#Gives "thing1\,blah","thing2\,blah","thing3\,blah",thing4
这个逻辑是,如果一个 ,
前面和后面都没有紧跟着 "
,就把它替换成 \,
。
1
一般编辑
之前问题中有一段内容
"thing1\\,blah","thing2\\,blah","thing3\\,blah",thing4
但现在已经没有了。
而且,我之前没有注意到 r'[^\\],'
这个部分。
所以,我完全重写了我的回答。
"thing1,blah","thing2,blah","thing3,blah",thing4
还有
"thing1\,blah","thing2\,blah","thing3\,blah",thing4
显示字符串的内容(我想是这样)
import re
ss = '"thing1,blah","thing2,blah","thing3\,blah",thing4 '
regx = re.compile('"[^"]*"')
def repl(mat, ri = re.compile('(?<!\\\\),') ):
return ri.sub('\\\\',mat.group())
print ss
print repr(ss)
print
print regx.sub(repl, ss)
print repr(regx.sub(repl, ss))
结果
"thing1,blah","thing2,blah","thing3\,blah",thing4
'"thing1,blah","thing2,blah","thing3\\,blah",thing4 '
"thing1\blah","thing2\blah","thing3\,blah",thing4
'"thing1\\blah","thing2\\blah","thing3\\,blah",thing4 '
3
csv
模块非常适合处理这种数据,因为它的csv.reader
在默认设置下会忽略被引号包围的逗号。而csv.writer
会因为有逗号的存在而重新加上引号。我使用了StringIO
来让字符串看起来像一个文件,这样可以更方便地操作。
import csv
import StringIO
s = '''"thing1,blah","thing2,blah","thing3,blah"
"thing4,blah","thing5,blah","thing6,blah"'''
source = StringIO.StringIO(s)
dest = StringIO.StringIO()
rdr = csv.reader(source)
wtr = csv.writer(dest)
for row in rdr:
wtr.writerow([item.replace('\\,',',').replace(',','\\,') for item in row])
print dest.getvalue()
结果:
"thing1\,blah","thing2\,blah","thing3\,blah"
"thing4\,blah","thing5\,blah","thing6\,blah"