找到所有文本句子的Regex？

网友

1楼 · 编辑于 2024-06-16 09:42:57

已编辑：现在它也可以处理多行语句。

>>> t = "OMG is this a question ! Is this a sentence ? My\n name is."
>>> re.findall("[A-Z].*?[\.!?]", t, re.MULTILINE | re.DOTALL )
['OMG is this a question !', 'Is this a sentence ?', 'My\n name is.']

只剩下一件事要解释-re.DOTALL使.匹配所描述的换行符here

网友

2楼 · 编辑于 2024-06-16 09:42:57

regex中有两个问题：

你的表达式是anchoredby ^和$，它们分别是“行的开始”和“行的结束”锚。这意味着您的模式希望匹配整行文本。
您正在标点符号前面搜索\s+，标点符号指定one or more whitespace character。如果标点符号前没有空格，则表达式将不匹配。

网友

3楼 · 编辑于 2024-06-16 09:42:57

像这样的方法有效：

## pattern: Upercase, then anything that is not in (.!?), then one of them
>>> pat = re.compile(r'([A-Z][^\.!?]*[\.!?])', re.M)
>>> pat.findall('OMG is this a question ! Is this a sentence ? My. name is.')
['OMG is this a question !', 'Is this a sentence ?', 'My.']

注意name is.不在结果中，因为它不是以大写字母开头的。

您的问题来自于^$锚的使用，它们作用于整个文本。

相关问题更多 >

编程相关推荐

热门问题

热门文章

找到所有文本句子的Regex？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >