python regex模式是大量的项目，最佳实践？

theregexlist = ["MR", "DR", "MRS" ... "MISS", "PHD"] #several hundred personname = "MR JOEY SMITH" #other examples are similar like "BOBBY DR MD JOE" for theregex in theregexlist: if re.search(theregex, personname): do stuffs.... break #since my list is ordered, I only want the first match

1条回答

网友

1楼 · 发布于 2024-06-08 12:01:26

包含所有“particules”（如“MR”、“MS”等）的“big”正则表达式将更加高效，因为它只编译一次。减少函数调用（这是一种优化）。在

如果在分词中有特殊字符，可能需要用re.escape对它们进行转义。在

您可以编译RegEx并获得对search方法的引用。在

下面是一个例子：

import re

particules = ["MR", "DR", "MRS", "MISS", "PHD"]

regex = r"\b(?:" + "|".join(map(re.escape, particules)) + r")\b"
search_any_particule = re.compile(regex, flags=re.IGNORECASE).search

personname = "FRED DR FLINTSTONE"

mo = search_any_particule(personname)
if mo:
    print(mo.group())

你会得到：“医生”。在

编辑

确保实现高效的最佳方法是分析它。为此，您可以使用cProfile库。在

例如：

^{pr2}$

剖析器会给你这样的信息：

         3000003 function calls in 2.110 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.353    0.353    2.110    2.110 <string>:1(<module>)
  1000000    0.495    0.000    1.757    0.000 python:10(find_particule)
        1    0.000    0.000    2.110    2.110 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
  1000000    0.185    0.000    0.185    0.000 {method 'group' of '_sre.SRE_Match' objects}
  1000000    1.078    0.000    1.078    0.000 {method 'search' of '_sre.SRE_Pattern' objects}

相关问题更多 >

编程相关推荐

热门问题

热门文章