把段落分成句子

2024-04-26 02:19:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用下面的Python代码(不久前在网上找到的)将段落分割成句子。在

def splitParagraphIntoSentences(paragraph):
  import re
  sentenceEnders = re.compile(r"""
      # Split sentences on whitespace between them.
      (?:               # Group for two positive lookbehinds.
        (?<=[.!?])      # Either an end of sentence punct,
      | (?<=[.!?]['"])  # or end of sentence punct and quote.
      )                 # End group of two positive lookbehinds.
      (?<!  Mr\.   )    # Don't end sentence on "Mr."
      (?<!  Mrs\.  )    # Don't end sentence on "Mrs."
      (?<!  Jr\.   )    # Don't end sentence on "Jr."
      (?<!  Dr\.   )    # Don't end sentence on "Dr."
      (?<!  Prof\. )    # Don't end sentence on "Prof."
      (?<!  Sr\.   )    # Don't end sentence on "Sr."."
    \s+               # Split on whitespace between sentences.
    """, 
    re.IGNORECASE | re.VERBOSE)
  sentenceList = sentenceEnders.split(paragraph)
  return sentenceList

我很好地满足了我的目的,但是现在我需要Javascript中完全相同的regex(以确保输出的一致性),并且我正在努力将这个Python正则表达式转换成一个与Javascript兼容的正则表达式。在


Tags: ofreonsentencesbetweensentenceendsplit
1条回答
网友
1楼 · 发布于 2024-04-26 02:19:14

它不是用于直接拆分的regex,而是一种解决方法:

(?!Mrs?\.|Jr\.|Dr\.|Sr\.|Prof\.)(\b\S+[.?!]["']?)\s

DEMO

例如,可以将匹配的片段替换为:$1#(或者文本中没有出现的其他字符,而不是#),然后用#DEMO将其拆分。 不过,这并不是一个太优雅的解决方案。在

相关问题 更多 >