如何排除含有特定单词的句子

regEX=re.compile('|'.join([r'\bstudent\b',r'\bstudy[ing]\b']),re.I) matched_data=re.match(regEX,data) if matched_data is not None: continue else: ## write the sentence to excel

1条回答

网友

1楼 · 发布于 2024-05-14 09:41:47

这里有两件事：

1）使用re.search（re.match只在字符串开头搜索）
2）正则表达式应该是regEX=re.compile(r"\b(?:{})\b".format('|'.join([r'student',r'study(?:ing)?'])),re.I)

[ing]只匹配一个符号，即i、n或g，而您打算匹配一个可选的ing结尾。带有?量词(?:ing)?的非捕获组实际上匹配了1或0个ing序列

而且，\b(x|y)\b是比\bx\b|\by\b更有效的模式，因为它涉及更少的回溯步骤。你知道吗

下面是这个正则表达式的a demo：

import re
pat = r"\b(?:{})\b".format('|'.join([r'student',r'study(?:ing)?']))
print(pat)
# => \b(?:student|study(?:ing)?)\b
regEX=re.compile(pat,re.I)
s = "He is studying here."
mObj = regEX.search(s)
if mObj: 
    print(mObj.group(0))
# => studying

相关问题更多 >

编程相关推荐

热门问题

热门文章