检查一个字符串是否至少包含lis中的一个字符串

网友

1楼 · 编辑于 2024-06-06 13:58:15

# Please do not name a list "list" -- it overrides the built-in
lst = ["a", "b", "c"]
if any(s in line for s in lst):
    # Do stuff

上面的代码将测试lst中的任何项是否可以在line中找到。如果是，将运行# Do stuff。

请参见下面的演示：

>>> lst = ["aq", "bs", "ce"]
>>> if any(s in "aqwerqwerqwer" for s in lst):
...     print(True)
...
True
>>> if any(s in "qweqweqwe" for s in lst):
...     print(True)
...
>>>

网友

2楼 · 编辑于 2024-06-06 13:58:15

对于将正则表达式引擎与自动创建的正则表达式一起使用，这实际上是一个很好的用例。

尝试：

def re_match(strings_to_match, my_file):
    # building regular expression to match
    expression = re.compile(
        '(' + 
        '|'.join(re.escape(item) for item in strings_to_match) +
        ')')

    # perform matching
    for line in my_file:
        if not expression.search(line):
            return False
    return True

正则表达式将比简单的线性扫描每个字符串来匹配每一行更快。这有两个原因：正则表达式是用C实现的，正则表达式被编译成一个状态机，它只检查每个输入字符一次，而不是像在一个天真的解决方案中那样多次。

请参阅IPython笔记本中的比较：http://nbviewer.ipython.org/gist/liori/10170227。测试数据由3000个字符串组成，与100万行的列表相匹配。天真的方法在我的机器上花了1分46秒，而这个解决方案只有9.97秒

网友

3楼 · 编辑于 2024-06-06 13:58:15

您可以使用itertools.groupby：

from itertools import groupby
pats = ['pat', 'pat2', …]
matches = groupby(lines, keyfunc=lambda line:any(pat in line for pat in pats))

如果模式都是单个字符串，则可以使用一组：

pats = set('abcd')
matches = groupby(lines, keyfunc=pats.intersection)

这将导致

[(matched patterns, lines matched),
 (empty list, lines not matched),
 (matched patterns, lines matched),
 …]

（除了它是一个生成器，而不是一个列表）这是它的主要逻辑。下面是一种迭代经过预处理的生成器生成输出的方法。

for linegrp in matches:
  for line in matched_pats, linegrp:
    if matched_pats:
      print('"{}" matched because of "{}"'.format(line, matched_pats))
    else:
      print('"{}" did not match')

相关问题更多 >

编程相关推荐

热门问题

热门文章