我正在使用下面的Python代码(不久前在网上找到的)将段落分割成句子。在
def splitParagraphIntoSentences(paragraph):
import re
sentenceEnders = re.compile(r"""
# Split sentences on whitespace between them.
(?: # Group for two positive lookbehinds.
(?<=[.!?]) # Either an end of sentence punct,
| (?<=[.!?]['"]) # or end of sentence punct and quote.
) # End group of two positive lookbehinds.
(?<! Mr\. ) # Don't end sentence on "Mr."
(?<! Mrs\. ) # Don't end sentence on "Mrs."
(?<! Jr\. ) # Don't end sentence on "Jr."
(?<! Dr\. ) # Don't end sentence on "Dr."
(?<! Prof\. ) # Don't end sentence on "Prof."
(?<! Sr\. ) # Don't end sentence on "Sr."."
\s+ # Split on whitespace between sentences.
""",
re.IGNORECASE | re.VERBOSE)
sentenceList = sentenceEnders.split(paragraph)
return sentenceList
我很好地满足了我的目的,但是现在我需要Javascript中完全相同的regex(以确保输出的一致性),并且我正在努力将这个Python正则表达式转换成一个与Javascript兼容的正则表达式。在
它不是用于直接拆分的regex,而是一种解决方法:
DEMO
例如,可以将匹配的片段替换为:
$1#
(或者文本中没有出现的其他字符,而不是#
),然后用#
DEMO将其拆分。 不过,这并不是一个太优雅的解决方案。在相关问题 更多 >
编程相关推荐