正则表达式匹配多个列表中的所有字符串

2024-06-01 00:21:57 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用正则表达式匹配列表中的所有字符串

import re

brand_list = ['scurfa', 'seagull', 'seagull', 'seiko']

regular_expression = rf"({'|'.join(brand_list)}) ([^\s]+)"

description = """
VINTAGE KING SEIKO 44-9990 Gold Medallion,Manual Winding with mod caseback.Serviced 2019.
Power reserve function at 12; push-pull crown at 4
Seiko NE57 auto movement with power reserve
Multilayered dial with SuperLuminova BG-W9
Testing for a ScURFA 42342
"""

print([" ".join(t) for t in re.findall(regular_expression, soup_content.find('blockquote', { "class": "postcontent restore" }).text, re.IGNORECASE)])

我有这些

['SEIKO 44-9990', 'Seiko NE57', 'ScURFA 42342']

但是我想根据这个列表用这个({'|'.join(model_list)})替换([^\s]+)

model_list = ['44-9990 Gold Medallion', 'NE57 auto', '42342 ']

所以我可以得到更像这样的输出

['SEIKO 44-9990 Gold Medallion', 'Seiko NE57 auto', 'ScURFA 42342']

Tags: re列表autowithlistjoinbrandregular
2条回答

您可以使用与其他正则表达式完全相同的方法:

regular_expression = rf"({'|'.join(brand_list)}) *({'|'.join(model_list)})?"

输出:

['SEIKO 44-9990 Gold Medallion', 'Seiko NE57 auto', 'ScURFA 42342']

正则表达式中两个列表联接之间的 *意味着它匹配这两个,有空格还是没有空格

编辑:

我用来测试的完整代码:

import re

brand_list = ['scurfa', 'seagull', 'seagull', 'seiko']
model_list = ['44-9990 Gold Medallion', 'NE57 auto', '42342']

regular_expression = rf"({'|'.join(brand_list)}) *({'|'.join(model_list)})?"

description = """
VINTAGE KING SEIKO 44-9990 Gold Medallion,Manual Winding with mod caseback.Serviced 2019.
Power reserve function at 12; push-pull crown at 4
Seiko NE57 auto movement with power reserve
Multilayered dial with SuperLuminova BG-W9
Testing for a ScURFA 42342
"""

print([" ".join(t) for t in re.findall(regular_expression, description, re.IGNORECASE)])

编辑2:

为可选模型regex添加了尾随问号

您可以使用

import re
brand_list = ['scurfa', 'seagull', 'seiko']
description = """
VINTAGE KING SEIKO 44-9990 Gold Medallion,Manual Winding with mod caseback.Serviced 2019.
Power reserve function at 12; push-pull crown at 4
Seiko NE57 auto movement with power reserve
Multilayered dial with SuperLuminova BG-W9
Testing for a ScURFA 42342
"""
model_list = ['44-9990 Gold Medallion', 'NE57 auto', '42342']
regular_expression = rf"(?:{'|'.join(brand_list)})(?:\s+(?:{'|'.join(model_list)}))?"
print(re.findall(regular_expression, description, re.IGNORECASE))

输出:['SEIKO 44-9990 Gold Medallion', 'Seiko NE57 auto', 'ScURFA 42342']

online Python demo

rf"(?:{'|'.join(brand_list)})(?:\s+(?:{'|'.join(model_list)}))?"部分创建一个(?:scurfa|seagull|seiko)(?:\s+(?:44-9990 Gold Medallion|NE57 auto|42342))?模式(参见its online demo),该模式匹配scurfaseagullseiko,然后可选地创建一个或多个空格,然后是44-9990 Gold MedallionNE57 auto42342

如果使用非捕获组,则不需要列表理解,请将re.findall(regular_expression, description, re.IGNORECASE)与模式一起使用

将短语作为整词匹配,考虑添加单词边界:

regular_expression = rf"\b(?:{'|'.join(brand_list)})(?:\s+(?:{'|'.join(model_list)}))?\b"

相关问题 更多 >