Python 正则表达式 findall

48 投票

5 回答

190142 浏览

提问于 2025-04-17 04:14

我想用Python 2.7.2中的正则表达式从一个字符串中提取所有带标签的词。简单来说，就是我想提取所有在[p][/p]标签之间的文字。

这里是我的尝试：

regex = ur"[\u005B1P\u005D.+?\u005B\u002FP\u005D]+?"
line = "President [P] Barack Obama [/P] met Microsoft founder [P] Bill Gates [/P], yesterday."
person = re.findall(pattern, line)

打印person的结果是['President [P]', '[/P]', '[P] Bill Gates [/P]']

我想要的正确正则表达式应该能得到：['[P] Barack Obama [/P]', '[P] Bill Gates [/p]']或者['Barrack Obama', 'Bill Gates']。

正则表达式字符串处理文本提取标签解析

5 个回答

你的问题不是特别清楚，但我猜你是想找出所有在 [P][/P] 标签之间的文字：

>>> import re
>>> line = "President [P] Barack Obama [/P] met Microsoft founder [P] Bill Gates [/P], yesterday."
>>> re.findall('\[P\]\s?(.+?)\s?\[\/P\]', line)
['Barack Obama', 'Bill Gates']

回答于 2025-04-17 由 Python大师

分享举报

试试这个：

   for match in re.finditer(r"\[P[^\]]*\](.*?)\[/P\]", subject):
        # match start: match.start()
        # match end (exclusive): match.end()
        # matched text: match.group()

回答于 2025-04-17 由 Python大师

分享举报

import re
regex = ur"\[P\] (.+?) \[/P\]+?"
line = "President [P] Barack Obama [/P] met Microsoft founder [P] Bill Gates [/P], yesterday."
person = re.findall(regex, line)
print(person)

产生

['Barack Obama', 'Bill Gates']

这个正则表达式 ur"[\u005B1P\u005D.+?\u005B\u002FP\u005D]+?" 和 u'[[1P].+?[/P]]+?' 是完全一样的，只是后者更容易理解。

第一个方括号组 [[1P] 告诉 re，在这个列表 ['[', '1', 'P'] 中的任何字符都可以匹配，第二个方括号组 [/P]] 也是如此。但这并不是你想要的。所以，

去掉外面的方括号。（同时也要去掉 P 前面的多余 1。）
为了保护 [P] 中的字面方括号，需要用反斜杠转义：\[P\]。
为了只返回标签内的内容，要在 .+? 周围加上括号。

回答于 2025-04-17 由 Python大师

分享举报

Python 正则表达式 findall

5 个回答

撰写回答