Python回复sub以及重新匹配不匹配?

2024-03-29 14:40:53 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试从文件中删除以下字符串的所有实例:

{ "userID":(some 6 digit number), "array":[]},

特别是,我想找到所有这样的子字符串并用nothing(“”)替换它们

我从使用重新匹配为了确保我的表达是正确的:

matchObj = re.match( r'({.*?"array":\[\]\},?)', g)

在这个问题上,我想两次关闭贪心返回。但当我搬到回复sub它匹配了我没想到会匹配的字符串的许多部分。尤其是这句话:

matchObj = re.match( r'({.*?"array":\[\]\},?)', g)
ggg =  re.sub( r'({.*?"array":\[\]\},?)', '', g)

对于g值:

g = 'fedsgedsgs {"all": [{"userID": 777, "array":[]},azgagaga{"userID": 777, "array":[{"expand":"abs","id":503711372,"sport":18,"start_time":"2015-04-15T16:11:12.000Z","local_start_time":"2015-04-15T17:11:12.000Z","distance":4.281959056854248,"duration":2\
891.0,"speed_avg":5.332083225415182,"speed_max":6.74372,"altitude_min":27.0,"altitude_max":61.0,"ascent":80.0,"descent":86.0},{"expand":"abs","id":470811412,"sport":18,"start_time":"2015-02-11T09:27:10.000Z","local_start_time":"2015-02-\
11T10:27:10.000Z","distance":0.0,"duration":0.0},{"expand":"abs","id":470755226,"sport":18,"start_time":"2015-02-11T09:25:04.000Z","local_start_time":"2015-02-11T10:25:04.000Z","distance":0.0,"duration":0.0,"speed_max":0.0,"altitude_min\
":45.0,"altitude_max":45.0},{"expand":"abs","id":470749841,"sport":18,"start_time":"2015-02-11T09:10:43.000Z","local_start_time":"2015-02-11T10:10:43.000Z","distance":0.7858999967575073,"duration":479.0,"speed_avg":5.90655529922135,"spe\
ed_max":6.82629,"altitude_min":35.0,"altitude_max":57.0,"ascent":45.0,"descent":32.0}]},{"userID": 777, "array":[{"expand":"abs","id":470745921,"sport":0,"start_time":"2015-02-11T09:00:48.000Z","local_start_time":"2015-02-11T15:00:48.00\
0Z","distance":0.0,"duration":15.0,"speed_avg":0.0}]},{"userID": 777, "array":[{"expand":"abs","id":498050248,"sport":2,"start_time":"2015-04-06T14:00:03.000Z","local_start_time":"2015-04-06T19:00:03.000Z","distance":16.55500030517578,"\
duration":2793.51,"speed_avg":21.334450601083514,"speed_max":36.3397,"altitude_min":1.8,"altitude_max":35.5,"ascent":50.7,"descent":61.8},{"expand":"abs","id":498049916,"sport":2,"start_time":"2015-04-06T13:59:35.000Z","local_start_time\
":"2015-04-06T18:59:35.000Z","distance":0.010999999940395355,"duration":10.2,"speed_avg":3.882352920139537,"speed_max":2.072,"altitude_min":8.4,"altitude_max":8.4,"ascent":0.0,"descent":0.0},{"expand":"abs","id":486139822,"sport":2,"sta\
rt_time":"2015-03-15T00:21:08.000Z","local_start_time":"2015-03-15T06:21:08.000Z","distance":23.302000045776367,"duration":3997.54,"speed_avg":20.984705635164357,"speed_max":38.4344,"altitude_min":-7.3,"altitude_max":14.6,"ascent":20.1,\
"descent":42.1},{"expand":"abs","id":486139782,"sport":2,"start_time":"2015-03-15T00:20:50.000Z","local_start_time":"2015-03-15T06:20:50.000Z","distance":0.0,"duration":2.99,"speed_avg":0.0,"speed_max":0.0,"altitude_min":4.8,"altitude_m\
ax":4.8,"ascent":0.0,"descent":0 {"userID": 777, "array":[]}, mmmmmmmm {"userID": 7767, "array":[]}, gggggggg {"userID": 74577, "array":[]}, ggggggggggggggg {"userID": 774447, "array":[]}, hrdshe {"userID": 722277, "array":[]},'

导致ggg的输出:

In[37]:   ggg
Out[37]: 'fedsgedsgs azgagaga mmmmmmmm  gggggggg  ggggggggggggggg  hrdshe '

表达式正在用“”替换此窗体的表达式:

  { "userID":(some 6 digit number), "array":[lots of json objects printed here.....]},

但是我想保留这些表达式(那些具有非空数组的表达式)不变。你知道吗

我试图从\[\]中删除转义键,因为我只想匹配“[]”,但随后收到一条错误消息,说明我的表达式不完整。为什么我要将[....stuff....]与内部垃圾匹配,如何只匹配“[]”?你知道吗

更新

所以这是有效的:

ggg=回复sub(r'“userID”:[0-9]{6,6},“array”:[]},',“FOUND IT',g)

不知何故,贪婪似乎不是问题。如果有人能向我解释为什么上面的工作,但不是原来的尝试,我真的很想知道。你知道吗


Tags: idtimelocalabsarraystartmaxdistance
2条回答

我想你误解了贪婪和不贪婪。ungreedy并不禁止正则表达式从{匹配到遥远的"array":[]},。你知道吗

Ungreedy将只匹配更接近的"array:[]},。你知道吗

你可以用一个[^}]替换你的*,以明确阻止你的*{}对中“脱身”。你知道吗

但是为什么不使用json.loads文件,清理,并使用json.dumps文件? 在:}周围有一些空格或新行(仍然有效的json)怎么样?你知道吗

re.match()被隐式地锚定。也就是说:

re.match('foo', content)   # find foo only at the beginning of content

…与…相同。。。你知道吗

re.match('^foo', content)  # find foo only at the beginning of content

……鉴于:

re.sub('foo', 'bar', content) # replace foo with bar everywhere in content

…隐式地未编排,使其行为与

re.search('foo', content) # find foo everywhere in content

…它将在foo中找到content。你知道吗


因此,要使与re.sub()一起使用的正则表达式与与re.match()一起使用的正则表达式的行为相同,请添加一个显式的^锚点。你知道吗


(顺便说一句,以这种方式修改JSON的尝试注定会以痛苦和痛苦告终。解析、更新和重新序列化,否则您将面临一系列不必要的bug)。你知道吗

相关问题 更多 >