正则表达式匹配多个分隔符

2024-06-12 08:56:36 发布

您现在位置:Python中文网/ 问答频道 /正文

我尝试使用以下分隔符:句号、分号、*、+、?以及- 但是,我只想在“-”出现在句首时分开(以免像“非功能性”这样的词分开)

我尝试了以下方法,但没有任何进展,任何帮助都将不胜感激:

sentences = re.split("[.-;]*[\+]*[\?]*[\*]*", txt)

下面是我一直在尝试的示例文本:

- Text Editor: Now you can edit plain text files with airport tools
* Updated Dropbox support 
* Improved
stability
- New icon                                                                          
* See this case mis-alignment

拆分后的预期输出是项目列表:

TextEditor: Now you can edit plain text files with airport tools, Updated Dropbox support, Improved stability, New icon, See this case mis-alignment

Tags: textyousupportwithfilestoolseditcan
3条回答

您可以使用这个re.split函数。你知道吗

>>> import re
>>> s = '''- Text Editor: Now you can edit plain text files with airport tools
* Updated Dropbox support 
* Improved
stability
- New icon'''
>>> [i for i in re.split(r'(?m)\s*^[-*+?]+\s*', s) if i]
['Text Editor: Now you can edit plain text files with airport tools', 'Updated Dropbox support', 'Improved\nstability', 'New icon']

如果要将字符串拆分为一组已定义的分隔符,请执行以下操作:

>>> txt = '- Text Editor: Now you can edit plain text files with airport tools'
>>> r = re.split(r'([.;*+?-]+)',txt)
>>> r
['', '-', ' Text Editor: Now you can edit plain text files with airport tools']

如果您不想在结果列表中使用这些分隔符,请执行以下操作:

>>> r = re.split(r'[.;*+?-]+',txt)
>>> r
['', ' Text Editor: Now you can edit plain text files with airport tools']

编辑:针对您的以下评论,使用\s表示空格:

    >>> txt = '''- Text Editor: Now you can edit plain text files with airport tools
    * Updated Dropbox support 
    * Improved
    stability
    - New icon'''
     >>> r = re.split('(^|\s)+[.;*+?-]+($|\s)+',txt) 
     >>> [i for i in r if len(i) > 1]
['Text Editor: Now you can edit plain text files with airport tools', 'Updated Dropbox support', 'Improved\n    stability', 'New icon']

尝试按以下方式枚举分隔符:

re.split("[.;*+?] ")

相关问题 更多 >