Python正则表达式嵌套搜索与替换

2 投票
5 回答
534 浏览
提问于 2025-04-17 03:39

我需要在引号里的内容中查找并替换所有的逗号。
也就是说,

"thing1,blah","thing2,blah","thing3,blah",thing4  

需要变成

"thing1\,blah","thing2\,blah","thing3\,blah",thing4  

我的代码:

inFile  = open(inFileName,'r')
inFileRl = inFile.readlines()
inFile.close()

p = re.compile(r'["]([^"]*)["]')
for line in inFileRl:
    pg = p.search(line)
    # found comment block
    if pg:
        q  = re.compile(r'[^\\],')
        # found comma within comment block
        qg = q.search(pg.group(0))
        if qg:
            # Here I want to reconstitute the line and print it with the replaced text
            #print re.sub(r'([^\\])\,',r'\1\,',pg.group(0))

我需要根据正则表达式筛选出我想要的列,然后进一步过滤,
接着进行正则替换,最后再把这一行重新组合起来。

我该怎么在Python中做到这一点呢?

5 个回答

0

你可以试试这个正则表达式。


>>> re.sub('(?<!"),(?!")', r"\\,", 
                     '"thing1,blah","thing2,blah","thing3,blah",thing4')
#Gives "thing1\,blah","thing2\,blah","thing3\,blah",thing4

这个逻辑是,如果一个 , 前面和后面都没有紧跟着 ",就把它替换成 \,

1

一般编辑

之前问题中有一段内容

"thing1\\,blah","thing2\\,blah","thing3\\,blah",thing4   

但现在已经没有了。

而且,我之前没有注意到 r'[^\\],' 这个部分。

所以,我完全重写了我的回答。

"thing1,blah","thing2,blah","thing3,blah",thing4               

还有

"thing1\,blah","thing2\,blah","thing3\,blah",thing4

显示字符串的内容(我想是这样)

import re


ss = '"thing1,blah","thing2,blah","thing3\,blah",thing4 '

regx = re.compile('"[^"]*"')

def repl(mat, ri = re.compile('(?<!\\\\),') ):
    return ri.sub('\\\\',mat.group())

print ss
print repr(ss)
print
print      regx.sub(repl, ss)
print repr(regx.sub(repl, ss))

结果

"thing1,blah","thing2,blah","thing3\,blah",thing4 
'"thing1,blah","thing2,blah","thing3\\,blah",thing4 '

"thing1\blah","thing2\blah","thing3\,blah",thing4 
'"thing1\\blah","thing2\\blah","thing3\\,blah",thing4 '
3

csv模块非常适合处理这种数据,因为它的csv.reader在默认设置下会忽略被引号包围的逗号。而csv.writer会因为有逗号的存在而重新加上引号。我使用了StringIO来让字符串看起来像一个文件,这样可以更方便地操作。

import csv
import StringIO

s = '''"thing1,blah","thing2,blah","thing3,blah"
"thing4,blah","thing5,blah","thing6,blah"'''
source = StringIO.StringIO(s)
dest = StringIO.StringIO()
rdr = csv.reader(source)
wtr = csv.writer(dest)
for row in rdr:
    wtr.writerow([item.replace('\\,',',').replace(',','\\,') for item in row])
print dest.getvalue()

结果:

"thing1\,blah","thing2\,blah","thing3\,blah"
"thing4\,blah","thing5\,blah","thing6\,blah"

撰写回答