使用以冒号“：”结尾的正则表达式进行分组

import collections class Group: def __init__(self): self.members = [] self.text = [] with open('out.txt','r') as f: groups = collections.defaultdict(Group) group_pattern = re.compile(r'(\S+(?: __[^__]*__)?)$(.*)$$') current_group = None for line in f: line = line.strip() m = group_pattern.match(line) if m: # this is a group definition line group_name, group_members = m.groups() groups[group_name].members.extend(group_members.split(',')) current_group = group_name for group_name, group in groups.items(): print "%s(%s)" % (group_name, ','.join(group.members))

3条回答

网友

1楼 · 编辑于 2024-04-25 16:44:19

在regex中，只需在最后添加:，并通过在冒号旁边添加?使其成为可选的，这样它就可以匹配这两种类型的字符串格式。你知道吗

(\S+(?: __[^__]*__)?)\((.*)\):?$

DEMO

网友

2楼 · 编辑于 2024-04-25 16:44:19

您可以在不使用正则表达式的情况下执行此操作：

f = [ 'car __name__(skoda,audi):\n', 'car __name__(benz):\n' ]
groups = {}
for line in f:
    v =  line.strip().split('__')
    gname, gitems = v[1], v[2]
    gitems = gitems.strip("():").split(",")
    groups[gname] = groups.get(gname, []) + gitems
print groups

网友

3楼 · 编辑于 2024-04-25 16:44:19

问题是在正则表达式的末尾有一个$。这迫使正则表达式寻找以括号结尾的模式。你知道吗

您可以通过在正则表达式中删除$来解决此问题（如果您认为会有其他尾随字符）：

(\S+(?: __[^__]*__)?)\((.*)\)

或者可以调整正则表达式，使其在模式中包含冒号出现0或1次的可能性：

(\S+(?: __[^__]*__)?)\((.*)\):?$

相关问题更多 >

编程相关推荐

热门问题

热门文章