使用特定字符串匹配行以提取值Python Regex

2024-04-23 21:06:10 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一些问题,在寻找正确的正则表达式为这个任务,请原谅我的初学者技能。我要做的只是从一行中获取id值,该行的“available”:true不是“available”:false。我能够通过re.findall('"id":(\d{13})', line, re.DOTALL)获得所有行的ID(13是正好匹配13个数字,因为代码中还有其他小于13个数字的ID,我不需要)。你知道吗

{"id":1351572979731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},
{"id":1351572329731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},
{"id":1351572943231,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},
{"id":1651572973431,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},

因此,最终结果需要是['16515729734331','1351572943231']

谢谢你的帮助谢谢


Tags: reidfalsetrue数字requirepidnull
3条回答

这可能不是一个好的答案-这取决于你到底有什么。看起来你有一个字符串列表,你想要其中一些字符串的id。如果是这样的话,如果你解析JSON而不是编写一个拜占庭正则表达式,那么它将变得更干净,更容易阅读。例如:

import json

# lines is a list of strings:

lines = ['{"id":1351572979731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}',
'{"id":1351572329731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}',
'{"id":1351572943231,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}',
'{"id":1651572973431,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}',
]

# parse it and you can use regular python to get what you want:
[line['id'] for line in map(json.loads, lines) if line['available']]

结果

[1351572943231, 1651572973431]

如果发布的代码是一个长字符串,则可以将其包装在[]中,然后将其解析为一个具有相同结果的数组:

import json

line = r'{"id":1351572979731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}, {"id":1351572329731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}, {"id":1351572943231,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},{"id":1651572973431,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}'

lines = json.loads('[' + line + ']')
[line['id'] for line in lines if line['available']]

这与你想要的相符

(?<="id":)\d{13}(?=(?:,"[^"]*":[^,]*?)*?,"available":true)

https://regex101.com/r/FseimH/1

扩展

 (?<= "id": )
 \d{13} 
 (?=
      (?: ," [^"]* ": [^,]*? )*?
      ,"available":true
 )

解释

 (?<= "id": )                        # Lookbehind assertion for id
 \d{13}                              # Consume 13 digit id
 (?=                                 # Lookahead assertion
      (?:                                 # Optional sequence
           ,                                   # comma
           " [^"]* "                           # quoted string
           :                                   # colon
           [^,]*?                              # optional non-comma's
      )*?                                 # End sequence, do 0 to many times - 
      ,"available":true                   # until we find  available = true
 )

在这里,我们可以简单地使用“id”作为左边界,并在捕获组中收集所需的数字:

"id":([0-9]+)

enter image description here

然后,我们可以继续给它添加边界。例如,如果需要13位数字,我们可以简单地:

\"id\":([0-9]{13})

相关问题 更多 >