提取多行哈希注释的正则表达式

2024-05-14 23:15:05 发布

您现在位置:Python中文网/ 问答频道 /正文

目前苦于作家布洛克试图想出一个优雅的解决这个问题的办法。在

以下面的例子为例:

{
  "data": {
    # Some information about field 1
    # on multiple lines
    "field1": "XXXXXXXXXX",

    # Some more info on a single line
    "field2": "XXXXXXXXXXX",

    "field3": "#this would be ignored"
  }
}

从上面来看,我希望将代码注释作为一个组而不是单独提取。如果一行正好在另一行后面加上注释,就会发生这种分组。注释总是以空格开头,后跟一个#。在

示例结果:

^{pr2}$

我可以跳过这些行并在代码中求值,但如果可能,最好使用正则表达式。如果您觉得正则表达式不是解决此问题的正确方法,请解释原因。在

小结:

感谢所有人提交了各种解决方案来解决这个问题,这是一个很好的例子,说明了SO社区能有多大的帮助。我将花一个小时自己的时间回答其他问题,以弥补在这方面花费的集体时间。在

希望这条线将来也能对其他人有所帮助。在


Tags: 代码fielddatainformationon时间somemultiple
3条回答

纯粹使用regex是不可能的,但是你可以用一行代码来逃避)

import re

str = """{
  "data": {
    # Some information about field 1
    # on multiple lines
    "field1": "XXXXXXXXXX",

    # Some more info on a single line
    "field2": "XXXXXXXXXXX"
    # Some information about field 1
    # on multiple lines
    # Some information about field 1
    # on multiple lines
    "field3": "#this would be ignored"
  }
}"""

rex = re.compile("(^(?!\s*#.*?[\r\n]+)(.*?)([\r\n]+|$)|[\r\n]*^\s*#\s*)+", re.MULTILINE)    
print rex.sub("\n", str).strip().split('\n\n')

输出:

^{pr2}$

您可以使用deque保留两行,并添加一些逻辑将注释分区为块:

src='''\
{
  "data": {
    # Some information about field 1
    # on multiple lines
    "field1": "XXXXXXXXXX",

    # Some more info on a single line
    "field2": "XXXXXXXXXXX",


    # multiple line comments
    # supported
    # as well 
    "field3": "#this would be ignored"

  }
}
'''

from collections import deque
d=deque([], 2)
blocks=[]
for line in src.splitlines():
    d.append(line.strip())
    if d[-1].startswith('#'):        
        comment=line.partition('#')[2]
        if d[0].startswith('#'):
            block.append(comment)
        else:
            block=[comment]
    elif d[0].startswith('#'):
        blocks.append(block)

for i, b in enumerate(blocks):
    print 'block {}: \n{}'.format(i, '\n'.join(b))  

印刷品:

^{pr2}$

您可以将re.findall与以下regex一起使用:

>>> m= re.findall(r'\s*#(.*)\s*#(.*)|#(.*)[^#]*',s,re.MULTILINE)
[(' Some information about field 1', ' on multiple lines', ''), ('', '', ' Some more info on a single line')]

对于打印,您可以:

^{pr2}$

但是对于多于2行的注释行,您可以使用itertools.groupby

s="""{
  "data": {
    # Some information about field 1
    # on multiple lines
    # threeeeeeeeecomment
    "field1": "XXXXXXXXXX"

    # Some more info on a single line
    "field2": "XXXXXXXXXXX",

    "field3": "#this would be ignored"
  }
}"""
from itertools import groupby

comments =[[i for i in j if i.strip().startswith('#')] for _,j in groupby(s.split('\n'),lambda x: x.strip().startswith('#'))]

for i,j in enumerate([m for m in comments if m],1):
        l=[t.strip(' #') for t in j]
        print 'group {} :{}'.format(i,' & '.join(l))

结果:

group 1 :Some information about field 1 & on multiple lines & threeeeeeeeecomment
group 2 :Some more info on a single line

相关问题 更多 >

    热门问题