我有一个很长的字符串S,包含以下格式的几个子字符串:
[&FULL="583 - node#597 <...a lot more characters inside...> ,REALNAME="node#638"]
即:
[&FULL="<a number with 1 to 3 digits>
开头REALNAME="node#<a number with 1 to 3 digits>"]
结尾我的目标是:
[&FULL=
后面的数字。我们把这些数字叫做x[i]
,表示子串i李>你可以想象,第二步和第三步很简单。我的部分解决方案是:
r'\[&FULL=[\s\S]*?(?=REALNAME="node#\d{1,3}"\])'
\[&FULL=
匹配子字符串的起始部分[\s\S]*
匹配子字符串中间的任何内容?(?=REALNAME="node#\d{1,3}"\])
将匹配子字符串的尾部,但这就是问题所在李>后面使用(?=...)
的部分不会在结果中返回REALNAME="node#638"]
,因为,这不是它的行为方式。但是我想保留字符串的尾部,这样就可以使用replace()
函数
编辑:当前解决方案
# Matches *almost* everything, except for the bit at the back
# places the matches in a list
pattern1 = r'\[&FULL=[\s\S]*?(?=REALNAME="node#\d{1,3}"\])'
pattern1_ls = re.findall(pattern1, my_long_string)
# Pattern to match just the back: 'REALNAME=...'
pattern2 = r'REALNAME="node#\d{1,3}"\]'
realnames_ls = re.findall(pattern2, my_long_string)
# regex to extract NUMBER from each pattern1 result
pattern = r'\[&FULL="\d{1,3}'
for i in range(len(pattern1_ls)):
# there should be only 1 result
result=re.findall(pattern, pattern1_ls[i])[0]
# ditch the first 8 characters, '[&FULL="'
node_num = result[8:]
original_pattern = pattern1_ls[i]
pattern1_ls[i] = [original_pattern, node_num]
# Replace pattern1 with [&branch_num=NUMBER]
for nd in pattern1_ls:
my_long_string = my_long_string.replace(nd[0], nd[1])
# Replace pattern2 with empty string (i.e. delete it)
for nm in realnames_ls:
my_long_string = my_long_string.replace(nm, "")
如果你只是使用分组,比如:
下面是一个示例的链接:https://regex101.com/r/SFiS1G/1
相关问题 更多 >
编程相关推荐