为什么我在使用重新拆分（）在python中？

网友

1楼 · 编辑于 2024-06-17 13:01:24

使用+获取尽可能多的分隔符，而不是只获取一个：

re.split('[().]+', s)

不幸的是，这还不够，因为re.split在字符串的开始和结束处会产生空字符串：

['', 'Type', 'Terrorist organization', 'AND', 'Involved in attacks', 'nine-eleven', '']

但您可以使用后期处理将其过滤掉：

[x for x in re.split('[().]+', s) if x]

另一方面，您可以还原regex并使用re.findall来匹配尽可能多的非分隔符：

re.findall('[^().]+', s)

这直接产生：

['Type', 'Terrorist organization', 'AND', 'Involved in attacks', 'nine-eleven']

网友

2楼 · 编辑于 2024-06-17 13:01:24

regexp分别匹配)、.和(。由于它们在输入中相邻，因此它们之间有一个空字符串，因此结果包含这些空字符串。你知道吗

如果要将分隔符序列视为单个分隔符，请将+量词添加到regexp，以便它将它们作为序列进行匹配。你知道吗

re.split('[|().]+', x)

开头的空字符串是因为第一个(之前的空字符串。类似地，结尾的空字符串来自输入中最后一个)之后的空字符串。我不认为有一个简单的方法来防止这些，只是把他们从结果中删除。你知道吗

网友

3楼 · 编辑于 2024-06-17 13:01:24

你可以filter：

filter(lambda x: x, re.split('[().]+', s))

测试：

import re
s = '(Type).(Terrorist organization)AND(Involved in attacks).(nine-eleven)'
print(list(filter(None, re.split('[().]+', s))))

结果：

['Type', 'Terrorist organization', 'AND', 'Involved in attacks', 'nine-eleven']