为字符串中的所有非单词字符添加反斜杠

import re sentences = ["Disallow DCCP sockets due to such NFC-3456", "Check at http://www.n.io/search?query=title++sub/file.html", "Specifies the hash algorithm on them"] url_key = ['www.n.io/search?query=title++sub', 'someweb.org/dirs.io'] # there are thousands of elements add_key = ['NFC-[0-9]{4}', 'CEZ-[0-9a-z]{4,8}'] # there are hundreds of elements pattern = url_key + add_key mykey = re.compile('(?:% s)' % '|'.join(pattern)) for item in sentences: if mykey.search(item): print (item, ' --> Keyword is found') else: print (item, ' --> Keyword is not Found')

error Traceback (most recent call last) <ipython-input-80-5348ee9c65ec> in <module>() 8 9 pattern = url_key + add_key ---> 10 mykey = re.compile('(?:% s)' % '|'.join(pattern)) 11 12 for item in sentences: ~/anaconda3/lib/python3.6/re.py in compile(pattern, flags) 231 def compile(pattern, flags=0): 232 "Compile a regular expression pattern, returning a pattern object." --> 233 return _compile(pattern, flags) 234 235 def purge(): ~/anaconda3/lib/python3.6/re.py in _compile(pattern, flags) 299 if not sre_compile.isstring(pattern): 300 raise TypeError("first argument must be string or compiled pattern") --> 301 p = sre_compile.compile(pattern, flags) 302 if not (flags & DEBUG): 303 if len(_cache) >= _MAXCACHE: ~/anaconda3/lib/python3.6/sre_compile.py in compile(p, flags) 560 if isstring(p): 561 pattern = p --> 562 p = sre_parse.parse(p, flags) 563 else: 564 pattern = None ~/anaconda3/lib/python3.6/sre_parse.py in parse(str, flags, pattern) 853 854 try: --> 855 p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0) 856 except Verbose: 857 # the VERBOSE flag was switched on inside the pattern. to be ~/anaconda3/lib/python3.6/sre_parse.py in _parse_sub(source, state, verbose, nested) 414 while True: 415 itemsappend(_parse(source, state, verbose, nested + 1, --> 416 not nested and not items)) 417 if not sourcematch("|"): 418 break ~/anaconda3/lib/python3.6/sre_parse.py in _parse(source, state, verbose, nested, first) 763 sub_verbose = ((verbose or (add_flags & SRE_FLAG_VERBOSE)) and 764 not (del_flags & SRE_FLAG_VERBOSE)) --> 765 p = _parse_sub(source, state, sub_verbose, nested + 1) 766 if not source.match(")"): 767 raise source.error("missing ), unterminated subpattern", ~/anaconda3/lib/python3.6/sre_parse.py in _parse_sub(source, state, verbose, nested) 414 while True: 415 itemsappend(_parse(source, state, verbose, nested + 1, --> 416 not nested and not items)) 417 if not sourcematch("|"): 418 break ~/anaconda3/lib/python3.6/sre_parse.py in _parse(source, state, verbose, nested, first) 617 if item[0][0] in _REPEATCODES: 618 raise source.error("multiple repeat", --> 619 source.tell() - here + len(this)) 620 if sourcematch("?"): 621 subpattern[-1] = (MIN_REPEAT, (min, max, item)) error: multiple repeat at position 31

Disallow DCCP sockets due to such NFC-3456 --> Keyword is found Check at http://www.n.io/search?query=title++sub/file.html --> Keyword is found Specifies the hash algorithm on them --> Keyword is not found

2条回答

网友

1楼 · 编辑于 2024-04-20 09:53:22

您的主要问题是字符串转义在regex替换转义之前生效。切换到原始字符串（禁止字符串转义）并转义反斜杠（因为\\本身就是一个替换转义）将解决此问题：

>>> print(re.sub(r'(\W)', r'\\\1', '?:n.io/search?query=title++sub'))
\?\:n\.io\/search\?query\=title\+\+sub

请注意，您可能不需要如此广泛的转义。如果您只想转义regex特殊字符，re.escape将为您执行以下操作：

>>> print(re.escape('?:n.io/search?query=title++sub'))
\?:n\.io/search\?query=title\+\+sub

不添加不必要的转义符（那些不需要取消正则表达式字符专用化的转义符）。你知道吗

网友

2楼 · 编辑于 2024-04-20 09:53:22

您应该使用原始字符串：

result = re.sub('(\W)', r'\\\1', mystring)

或者也要避开反斜杠：

result = re.sub('(\W)', '\\\\\\1', mystring)

相关问题更多 >

编程相关推荐

热门问题

热门文章