Python正则表达式：包含整个字段

2024-04-25 12:44:13 发布

男 | 程序猿一只，喜欢编程写python代码。

我有一个很长的字符串S，包含以下格式的几个子字符串：

[&FULL="583 - node#597 <...a lot more characters inside...> ,REALNAME="node#638"]

即：

它以[&FULL="<a number with 1 to 3 digits>开头
它以REALNAME="node#<a number with 1 to 3 digits>"]结尾
中间有很多字符，包括一些特殊字符和空格

我的目标是：

获取一个正则表达式，它可以获取所有这样的子字符串
只提取[&FULL=后面的数字。我们把这些数字叫做x[i]，表示子串i
用x[i]替换子串i

你可以想象，第二步和第三步很简单。我的部分解决方案是：

r'\[&FULL=[\s\S]*?(?=REALNAME="node#\d{1,3}"\])'

\[&FULL=匹配子字符串的起始部分
[\s\S]*匹配子字符串中间的任何内容
?(?=REALNAME="node#\d{1,3}"\])将匹配子字符串的尾部，但这就是问题所在

后面使用(?=...)的部分不会在结果中返回REALNAME="node#638"]，因为，这不是它的行为方式。但是我想保留字符串的尾部，这样就可以使用replace()函数

编辑：当前解决方案

# Matches *almost* everything, except for the bit at the back
# places the matches in a list
pattern1 = r'\[&FULL=[\s\S]*?(?=REALNAME="node#\d{1,3}"\])'
pattern1_ls = re.findall(pattern1, my_long_string)

# Pattern to match just the back: 'REALNAME=...'
pattern2 = r'REALNAME="node#\d{1,3}"\]'
realnames_ls = re.findall(pattern2, my_long_string)

# regex to extract NUMBER from each pattern1 result
pattern = r'\[&FULL="\d{1,3}'
for i in range(len(pattern1_ls)):
    # there should be only 1 result
    result=re.findall(pattern, pattern1_ls[i])[0]
    # ditch the first 8 characters, '[&FULL="'
    node_num = result[8:]
    original_pattern = pattern1_ls[i] 
    pattern1_ls[i] = [original_pattern, node_num]

# Replace pattern1 with [&branch_num=NUMBER]
for nd in pattern1_ls:
    my_long_string = my_long_string.replace(nd[0], nd[1])

# Replace pattern2 with empty string (i.e. delete it)
for nm in realnames_ls:
    my_long_string = my_long_string.replace(nm, "")

Tags： the to 字符串 in node for string my

1条回答

网友
1楼 · 发布于 2024-04-25 12:44:13

如果你只是使用分组，比如：
\[&FULL=[\s\S]*?(?P<string>REALNAME="node#\d{1,3}"\])
下面是一个示例的链接：https://regex101.com/r/SFiS1G/1

Python正则表达式：包含整个字段

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python正则表达式：包含整个字段

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >