如果存在空格而不是sp的情况,如何使用Regex

2024-04-27 00:25:57 发布

您现在位置:Python中文网/ 问答频道 /正文

我一直在尝试获取以下信息:

[('08/03/2019', '', '58', '71', '162', '', '1', '71.68', '69.03', '441381.64', '2829.37', '14', '1', '226', '2', '224', '62', '271')]

如果两个数字之间没有空格,这个方法就行了。你知道吗

原来是这样的:

'08/03/2019     175   58   71  162|    5     1| 71.68 69.03|  441381.64    2829.37|   14     1|  226    2  224   62|   271|'

[('08/03/2019', '175', '58', '71', '162', '5', '1', '71.68', '69.03', '441381.64', '2829.37', '14', '1', '226', '2', '224', '62', '271')]

使用的脚本是:

re.compile(r"([0-9]{2}\/[0-9]{2}\/[0-9]{4})\s{5}(\d+)\s{3}(\d+)\s{3}(\d+)\s{2}(\d+)[|]\s{4}(\d+)\s{5}(\d+)[|]\s{1}(\d+[.]\d+)\s{1}(\d+[.]\d+)[|]\s{2}(\d+[.]\d+)\s{4}(\d+[.]\d+)[|]\s{3}(\d+)\s{5}(\d+)[|]\s{2}(\d+)\s{4}(\d+)\s{2}(\d+)\s{3}(\d+)[|]\s{3}(\d+)")        

问题出现在原始数据集中出现空格时,例如,175和5缺少重新编译脚本不接收数字:

'08/03/2019        58   71  162|         1| 71.68 69.03|  441381.64    2829.37|   14     1|  226    2  224   62|   271|'

使用(\s+)或\s+进行拆分没有帮助,因为空间模式不同。 5,3,3,2,4,5,1,1,2,4,3,5,2,4,2,3,3将是空间。你知道吗


Tags: 方法re脚本信息原始数据模式空间数字
1条回答
网友
1楼 · 发布于 2024-04-27 00:25:57

您设计的表达式看起来很棒,您可能只想在那些捕获可能缺少值的组之后添加一个?,这可能会解决您现在面临的问题。你知道吗

Demo

例如,在这里,我们将添加两个?

import re

expression = r"([0-9]{2}\/[0-9]{2}\/[0-9]{4})\s{5}(\d+)?\s{3}(\d+)\s{3}(\d+)\s{2}(\d+)[|]\s{4}(\d+)?\s{5}(\d+)[|]\s{1}(\d+[.]\d+)\s{1}(\d+[.]\d+)[|]\s{2}(\d+[.]\d+)\s{4}(\d+[.]\d+)[|]\s{3}(\d+)\s{5}(\d+)[|]\s{2}(\d+)\s{4}(\d+)\s{2}(\d+)\s{3}(\d+)[|]\s{3}(\d+)"

string = """
08/03/2019        58   71  162|         1| 71.68 69.03|  441381.64    2829.37|   14     1|  226    2  224   62|   271|

08/03/2019     175   58   71  162|    5     1| 71.68 69.03|  441381.64    2829.37|   14     1|  226    2  224   62|   271|

"""


print(re.findall(expression, string))

输出

[('08/03/2019', '', '58', '71', '162', '', '1', '71.68', '69.03', '441381.64', '2829.37', '14', '1', '226', '2', '224', '62', '271'), ('08/03/2019', '175', '58', '71', '162', '5', '1', '71.68', '69.03', '441381.64', '2829.37', '14', '1', '226', '2', '224', '62', '271')]

If you wish to explore/simplify/modify the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.


相关问题 更多 >