用正则表达式正确地剥离：char

网友

1楼 · 编辑于 2024-05-23 20:35:59

我认为您打算将原始字符串传递给re.sub（注意r）。你知道吗

result = re.sub(r"\b[^\w\d_]+\b", " ",  s ).split()

退货：

['The', 'saddest', 'aspect', 'of', 'life', 'right', 'now', 'is', 'science', 'gathers', 'knowledge', 'faster', 'than', 'society', 'gathers', 'wisdom.']

网友

2楼 · 编辑于 2024-05-23 20:35:59

您忘记将其设置为原始字符串文本（r".."）

>>> import re
>>> s = "The saddest aspect of life right now is: science gathers knowledge faster than society gathers wisdom."
>>> re.sub("\b[^\w\d_]+\b", " ",  s ).split()
['The', 'saddest', 'aspect', 'of', 'life', 'right', 'now', 'is:', 'science', 'gathers', 'knowledge', 'faster', 'than', 'society', 'gathers', 'wisdom.']
>>> re.sub(r"\b[^\w\d_]+\b", " ",  s ).split()
['The', 'saddest', 'aspect', 'of', 'life', 'right', 'now', 'is', 'science', 'gathers', 'knowledge', 'faster', 'than', 'society', 'gathers', 'wisdom.']

网友

3楼 · 编辑于 2024-05-23 20:35:59

正如其他答案所指出的，您需要使用r定义一个原始字符串文本，比如：(r"...")

如果你想去掉句点，我相信你可以把正则表达式简化为：

result = re.sub(r"[^\w' ]", " ", s ).split()

正如您可能知道的，\w元字符会剥离字符串中任何不是a-z，a-z，0-9的内容

所以，如果你能预料到你的句子中不会有数字，那就应该做到这一点。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章

用正则表达式正确地剥离：char

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >