在Python中按多个字符分割字符串

4 投票

4 回答

3298 浏览

提问于 2025-04-18 00:37

我想在Python中像在Java中那样，用多个字符来分割一个字符串，代码大概是这样的：

private static final String SPECIAL_CHARACTERS_REGEX = "[ :;'?=()!\\[\\]-]+|(?<=\\d)(?=\\D)";
String rawMessage = "let's meet tomorrow at 9:30p? 7-8pm? i=you go (no Go!) [to do !]";
String[] tokens = rawMessage.split(SPECIAL_CHARACTERS_REGEX);
System.out.println(Arrays.toString(tokens));

这里有一个可以正常运行的示例，输出结果也正确：运行示例

我想在Python中做到完全一样的事情，但当我在正则表达式中添加了'单引号'字符后，它根本不进行分割。请问我该如何在Python中得到和上面Java程序一样的结果呢？

这个：

import re
tokens = re.split(' \.', line);
print tokens

对于这一行：

"let's meet tomorrow at 9:30p? 7-8pm? i=you go (no Go!) [to do !]"

得到的结果是：

["let's meet tomorrow at 9:30p? 7-8pm? i=you go (no Go!) [to do !]";]

当我想这样做的时候：

[let, s, meet, tomorrow, at, 9, 30, p, 7, 8, pm, i, you, go, no, Go, to, do]

正则表达式编程语言字符串处理 java 代码示例字符串分割运行结果多字符分割

4 个回答

使用以下代码

>>> chars = "[:;'?=()!\-]+<" #Characters to remove
>>> sentence = "let's meet tomorrow at 9:30p? 7-8pm? i=you go (no Go!) [to do !]" #Sentence
>>> for k in sentence: #Loops over everything in the sentence
...     if k in chars: #Checks if the variable is one we want to remove
...             sentence = sentence.replace(k, ' ') #If it is, it replaces it
...
>>> sentence = sentence.replace('p', ' p').replace('pm', ' pm').split() #Adds a space before the 'p' and the 'pm', and then splits it the way we want to
>>> sentence
['let', 's', 'meet', 'tomorrow', 'at', '9', '30', 'p', '7', '8', 'pm', 'i', 'you', 'go', 'no', 'Go', 'to', 'do']

如果你想使用 regex（正则表达式）的话：

line = "let's meet tomorrow at 9:30p? 7-8pm? i=you go (no Go!) [to do !]"
tokens = re.split("[ :;'?=()!\\[\\]-]+|(?<=\\d)(?=\\D)", line)
tokens = [token for token in tokens if len(token) != 0]
tokens = tokens.replace('p', ' p').replace('pm', ' pm').split()
print(tokens)
#['let', 's', 'meet', 'tomorrow', 'at', '9', '30', 'p', '7', '8', 'pm', 'i', 'you', 'go', 'no', 'Go', 'to', 'do']

回答于 2025-04-18 由 Python大师

分享举报

在Java中使用的这个分割正则表达式在Python中应该也能正常工作。
可能是个bug。让人困惑的地方可能在于 \D 和 [ :;'?=()!\[\]-] 之间的重叠，以及它是如何处理这些的（可能是个bug~）。

你可以尝试先把 (?<=\d)(?=\D) 放在前面来解决这个问题，但这需要一些技巧才能做到。

这里的这个正则表达式强制它这样做。这算是个变通办法吗？
我不知道，因为我没有Python来测试。但在Perl中是可以工作的。

强制的正则表达式 -

 #  (?<=\d)(?:[ :;'?=()!\[\]-]+|(?=\D))|(?<!\d|[ :;'?=()!\[\]-])[ :;'?=()!\[\]-]+

    (?<= \d )
    (?:
         [ :;'?=()!\[\]-]+ 
      |  (?= \D )
    )
 |  
    (?<! \d | [ :;'?=()!\[\]-] )
    [ :;'?=()!\[\]-]+

回答于 2025-04-18 由 Python大师

分享举报

这里有一个不同的方法，它是用来查找而不是分割：

>>> s = "let's meet tomorrow at 9:30p? 7-8pm? i=you go (no Go!) [to do !]"
>>> re.findall(r'\d+|[A-Za-z]+', s)
['let', 's', 'meet', 'tomorrow', 'at', '9', '30', 'p', '7', '8', 'pm', 'i', 'you', 'go', 'no', 'Go', 'to', 'do']

如果你想把字母和数字放在一起，可以用 '[0-9A-Za-z]+'。如果你想要字母、数字和下划线，可以用 r'\w+'。

回答于 2025-04-18 由 Python大师

分享举报

使用你在Java中用过的那个正则表达式：

line = "let's meet tomorrow at 9:30p? 7-8pm? i=you go (no Go!) [to do !]"
tokens = re.split("[ :;'?=()!\\[\\]-]+|(?<=\\d)(?=\\D)", line)
tokens = [token for token in tokens if len(token) != 0] # remove empty strings!
print(tokens)
# ['let', 's', 'meet', 'tomorrow', 'at', '9', '30p', '7', '8pm', 'i', 'you', 'go', 'no', 'Go', 'to', 'do']

回答于 2025-04-18 由 Python大师

分享举报

在Python中按多个字符分割字符串

4 个回答

撰写回答