提取多行哈希注释的正则表达式

{ "data": { # Some information about field 1 # on multiple lines "field1": "XXXXXXXXXX", # Some more info on a single line "field2": "XXXXXXXXXXX", "field3": "#this would be ignored" } }

3条回答

网友

1楼 · 编辑于 2024-05-14 23:15:05

纯粹使用regex是不可能的，但是你可以用一行代码来逃避）

import re

str = """{
  "data": {
    # Some information about field 1
    # on multiple lines
    "field1": "XXXXXXXXXX",

    # Some more info on a single line
    "field2": "XXXXXXXXXXX"
    # Some information about field 1
    # on multiple lines
    # Some information about field 1
    # on multiple lines
    "field3": "#this would be ignored"
  }
}"""

rex = re.compile("(^(?!\s*#.*?[\r\n]+)(.*?)([\r\n]+|$)|[\r\n]*^\s*#\s*)+", re.MULTILINE)    
print rex.sub("\n", str).strip().split('\n\n')

输出：

^{pr2}$

网友

2楼 · 编辑于 2024-05-14 23:15:05

您可以使用deque保留两行，并添加一些逻辑将注释分区为块：

src='''\
{
  "data": {
    # Some information about field 1
    # on multiple lines
    "field1": "XXXXXXXXXX",

    # Some more info on a single line
    "field2": "XXXXXXXXXXX",


    # multiple line comments
    # supported
    # as well 
    "field3": "#this would be ignored"

  }
}
'''

from collections import deque
d=deque([], 2)
blocks=[]
for line in src.splitlines():
    d.append(line.strip())
    if d[-1].startswith('#'):        
        comment=line.partition('#')[2]
        if d[0].startswith('#'):
            block.append(comment)
        else:
            block=[comment]
    elif d[0].startswith('#'):
        blocks.append(block)

for i, b in enumerate(blocks):
    print 'block {}: \n{}'.format(i, '\n'.join(b))

印刷品：

^{pr2}$

网友

3楼 · 编辑于 2024-05-14 23:15:05

您可以将re.findall与以下regex一起使用：

>>> m= re.findall(r'\s*#(.*)\s*#(.*)|#(.*)[^#]*',s,re.MULTILINE)
[(' Some information about field 1', ' on multiple lines', ''), ('', '', ' Some more info on a single line')]

对于打印，您可以：

^{pr2}$

但是对于多于2行的注释行，您可以使用itertools.groupby：

s="""{
  "data": {
    # Some information about field 1
    # on multiple lines
    # threeeeeeeeecomment
    "field1": "XXXXXXXXXX"

    # Some more info on a single line
    "field2": "XXXXXXXXXXX",

    "field3": "#this would be ignored"
  }
}"""
from itertools import groupby

comments =[[i for i in j if i.strip().startswith('#')] for _,j in groupby(s.split('\n'),lambda x: x.strip().startswith('#'))]

for i,j in enumerate([m for m in comments if m],1):
        l=[t.strip(' #') for t in j]
        print 'group {} :{}'.format(i,' & '.join(l))

结果：

group 1 :Some information about field 1 & on multiple lines & threeeeeeeeecomment
group 2 :Some more info on a single line

相关问题更多 >

编程相关推荐

热门问题

热门文章

提取多行哈希注释的正则表达式

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >