正则表达式匹配多个列表中的所有字符串

import re brand_list = ['scurfa', 'seagull', 'seagull', 'seiko'] regular_expression = rf"({'|'.join(brand_list)}) ([^\s]+)" description = """ VINTAGE KING SEIKO 44-9990 Gold Medallion,Manual Winding with mod caseback.Serviced 2019. Power reserve function at 12; push-pull crown at 4 Seiko NE57 auto movement with power reserve Multilayered dial with SuperLuminova BG-W9 Testing for a ScURFA 42342 """ print([" ".join(t) for t in re.findall(regular_expression, soup_content.find('blockquote', { "class": "postcontent restore" }).text, re.IGNORECASE)])

2条回答

网友

1楼 · 编辑于 2024-06-01 00:21:57

您可以使用与其他正则表达式完全相同的方法：

regular_expression = rf"({'|'.join(brand_list)}) *({'|'.join(model_list)})?"

输出：

['SEIKO 44-9990 Gold Medallion', 'Seiko NE57 auto', 'ScURFA 42342']

正则表达式中两个列表联接之间的 *意味着它匹配这两个，有空格还是没有空格

编辑：

我用来测试的完整代码：

import re

brand_list = ['scurfa', 'seagull', 'seagull', 'seiko']
model_list = ['44-9990 Gold Medallion', 'NE57 auto', '42342']

regular_expression = rf"({'|'.join(brand_list)}) *({'|'.join(model_list)})?"

description = """
VINTAGE KING SEIKO 44-9990 Gold Medallion,Manual Winding with mod caseback.Serviced 2019.
Power reserve function at 12; push-pull crown at 4
Seiko NE57 auto movement with power reserve
Multilayered dial with SuperLuminova BG-W9
Testing for a ScURFA 42342
"""

print([" ".join(t) for t in re.findall(regular_expression, description, re.IGNORECASE)])

编辑2：

为可选模型regex添加了尾随问号

网友

2楼 · 编辑于 2024-06-01 00:21:57

您可以使用

import re
brand_list = ['scurfa', 'seagull', 'seiko']
description = """
VINTAGE KING SEIKO 44-9990 Gold Medallion,Manual Winding with mod caseback.Serviced 2019.
Power reserve function at 12; push-pull crown at 4
Seiko NE57 auto movement with power reserve
Multilayered dial with SuperLuminova BG-W9
Testing for a ScURFA 42342
"""
model_list = ['44-9990 Gold Medallion', 'NE57 auto', '42342']
regular_expression = rf"(?:{'|'.join(brand_list)})(?:\s+(?:{'|'.join(model_list)}))?"
print(re.findall(regular_expression, description, re.IGNORECASE))

输出：['SEIKO 44-9990 Gold Medallion', 'Seiko NE57 auto', 'ScURFA 42342']

见online Python demo

rf"(?:{'|'.join(brand_list)})(?:\s+(?:{'|'.join(model_list)}))?"部分创建一个(?:scurfa|seagull|seiko)(?:\s+(?:44-9990 Gold Medallion|NE57 auto|42342))?模式（参见its online demo），该模式匹配scurfa、seagull或seiko，然后可选地创建一个或多个空格，然后是44-9990 Gold Medallion、NE57 auto或42342

如果使用非捕获组，则不需要列表理解，请将re.findall(regular_expression, description, re.IGNORECASE)与模式一起使用

将短语作为整词匹配，考虑添加单词边界：

regular_expression = rf"\b(?:{'|'.join(brand_list)})(?:\s+(?:{'|'.join(model_list)}))?\b"

编辑：

编辑2：

相关问题更多 >

编程相关推荐

热门问题

热门文章