为什么re.findall()与re.finditer()在Python中结果不同？

5 投票

5 回答

4661 浏览

提问于 2025-04-16 18:29

我写了一个正则表达式：

p = re.compile(r'''
\[\[            #the first [[
[^:]*?          #no :s are allowed
.*?             #a bunch of chars
(
\|              #either go until a |
|\]\]           #or the last ]]
)
                ''', re.VERBOSE)

我想用 re.findall 来获取某个字符串中所有匹配的部分。我写了一些测试代码，但结果却很奇怪。

这段代码

g = p.finditer('   [[Imae|Lol]]     [[sdfef]]')
print g
for elem in g:
    print elem.span()
    print elem.group()

给我的输出是：

(3, 10)
[[Imae|
(20, 29)
[[sdfef]]

这听起来很合理，对吧？但是当我这样做的时候：

h = p.findall('   [[Imae|Lol]]     [[sdfef]]')
for elem in h:
    print elem

输出却是：

|
]]

为什么 findall() 的结果和 finditer() 不一样呢？

正则表达式字符串处理 python库 finditer 匹配 findall

5 个回答

我觉得从findall()的文档中，最关键的部分是：

如果模式中有一个或多个分组，那么返回的将是一个分组的列表；如果模式有多个分组，这个列表会是一个元组的列表。

你的正则表达式在这里有一个分组，围绕着管道符号或闭合的]]：

(
\|              #either go until a |
|\]\]           #or the last ]]
)

finditer()似乎没有这样的条款。

回答于 2025-04-16 由 Python大师

分享举报

当你给 re.findall() 传递一个带有括号的正则表达式时，它会返回匹配到的组。在这里，你只有一个组，就是结尾的 | 或 ]]。而在你使用 re.finditer() 的代码中，你并没有特别要求某个组，所以它会返回整个字符串。

如果你想让 re.findall() 按照你的需求工作，可以在整个正则表达式外面加上括号，或者只在你实际想提取的部分加括号。假设你想解析维基链接，那么在第4行中的“那一堆字符”就是你需要的部分。例如：

p = re.compile(r'''
\[\[            #the first [[
[^:]*?          #no :s are allowed
(.*?)           #a bunch of chars
(
\|              #either go until a |
|\]\]           #or the last ]]
)
                ''', re.VERBOSE)

p.findall('   [[Imae|Lol]]     [[sdfef]]')

[('Imae', '|'), ('sdfef', ']]')]

回答于 2025-04-16 由 Python大师

分享举报

findall会返回一个匹配的组的列表。在你的正则表达式中，括号定义了一个组，findall会认为你想要这个组，但其实你并不需要这些组。(?:...)是一个不捕获的括号，也就是说它不会把这个组的内容单独提取出来。你可以把你的正则表达式改成：

'''
\[\[            #the first [[
[^:]*?          #no :s are allowed
.*?             #a bunch of chars
(?:             #non-capturing group
\|              #either go until a |
|\]\]           #or the last ]]
)
                '''

回答于 2025-04-16 由 Python大师

分享举报

为什么re.findall()与re.finditer()在Python中结果不同？

5 个回答

撰写回答