如何使用正则表达式在单词边界处拆分?

2024-05-07 23:31:09 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在努力做到:

import re
sentence = "How are you?"
print(re.split(r'\b', sentence))

结果是

^{pr2}$

我想要一些类似[u'How', u'are', u'you', u'?']的东西。如何做到这一点?在


Tags: importreyouaresentencehowsplitprint
2条回答

不幸的是,Python不能被空字符串分割。在

要解决这个问题,您需要使用findall,而不是split。在

实际上\b只是单词边界的意思。在

它相当于(?<=\w)(?=\W)|(?<=\W)(?=\w)。在

这意味着,以下代码将起作用:

import re
sentence = "How are you?"
print(re.findall(r'\w+|\W+', sentence))
import re
split = re.findall(r"[\w']+|[.,!?;]", "How are you?")
print(split)

输出:

^{pr2}$

Ideone Demo

Regex101 Demo


正则表达式说明:

"[\w']+|[.,!?;]"

    1st Alternative: [\w']+
        [\w']+ match a single character present in the list below
            Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
            \w match any word character [a-zA-Z0-9_]
            ' the literal character '
    2nd Alternative: [.,!?;]
        [.,!?;] match a single character present in the list below
            .,!?; a single character in the list .,!?; literally

相关问题 更多 >