Python 的 regex 模块:重复的“反向引用”似乎无法正确工作

2024-04-27 03:40:09 发布

您现在位置:Python中文网/ 问答频道 /正文

注意:我使用的是PyPi替代regex模块

我有一个python程序,我在其中寻找特定格式的重复标签,用逗号分隔。你知道吗

格式为:(*单词…*#*数字*)

例如:Trial #1, Trial #2, Run #3,Spring trial #13都适合这个格式。你知道吗

我使用原始字符串中的:([\w ]*#\d\d?,)\1*作为正则表达式模式。你知道吗

在java和各种正则表达式测试引擎中,对字符串使用findall()和以下模式:

Run #1,Run #1,Run #1,Run #1,Run #1,Run #1,Run #1,Run #2,Run #2,Run #2,Run #2,Run #2,Run #2,Run #2,Run #3,Run #3,Run #3,Run #3,Run #3,Run #3,Run #3, (...

...) Run #20,Run #20,Run #20,Run #20,Run #20,Run #20,Run #20

退货:

match 1: Run #1,Run #1,Run #1,Run #1,Run #1,Run #1,Run #1,

match 2: Run #2,Run #2,Run #2,Run #2,Run #2,Run #2,Run #2,

...etc.

但在python中,它返回:

match 1: Run #1,

match 2: Run #2,

...etc.

我希望它返回第一个结果(java和其他程序的正则表达式返回的结果)

python的regex引擎有什么我忽略的地方吗?为什么我会得到这个结果?你知道吗

我的代码是:

import regex

file = open('Pendulum Data.csv',mode='r')
header1 = file.readline()
header2 = file.readline()

pattern1 = regex.compile(r'([\w ]*#\d\d?)\1*',flags=regex.V0)
header1Match = pattern1.findall(header1)
for x in header1Match:
    print(x)

for循环和print语句用于查看结果。你知道吗

(这给我带来了另一个问题:regex.findall()到底返回了什么?findall()是不是在打印结果的时候返回了我想要的结果你知道吗

…是的,我正在用一个原始字符串作为我的模式。你知道吗


Tags: run字符串引擎程序readline格式match模式
1条回答
网友
1楼 · 发布于 2024-04-27 03:40:09

您正在正则表达式中使用捕获组。如果在模式中指定了捕获组,Python.finall将返回捕获文本的元组。因此,您正在寻找一个.finditer函数。你知道吗

Python ^{} documentation

Return an iterator yielding MatchObject instances over all non-overlapping matches for the RE pattern in string. The string is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result unless they touch the beginning of another match.

^{}

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.

下面是一个使用re.finditersmall demo

import re
p = re.compile(r'([\w ]*#\d\d?,)\1*')
test_str = "Run #1,Run #1,Run #1,Run #1,Run #1,Run #1,Run #1,Run #2,Run #2,Run #2,Run #2,Run #2,Run #2,Run #2,Run #3,Run #3,Run #3,Run #3,Run #3,Run #3,Run #3, (..."
print [x.group() for x in p.finditer(test_str)]

结果:

['Run #1,Run #1,Run #1,Run #1,Run #1,Run #1,Run #1,', 'Run #2,Run #2,Run #2,Run #2,Run #2,Run #2,Run #2,', 'Run #3,Run #3,Run #3,Run #3,Run #3,Run #3,Run #3,']

Casimir是对的,通过这样一个普通的正则表达式,您可以使用正则re模块。你知道吗

相关问题 更多 >