Python正则表达式匹配中如何处理multipleline模式

2024-05-26 09:18:44 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一份公司名称的清单,要用“公司”一词代替。跨越多行的列表。你知道吗

cmp=re.compile(""" A | B |
                   C | D
               """)
text='A is a great company, so is B'
cmp.sub('company',text)

但它不起作用。我该怎么解决这个问题?你知道吗

编辑:

上面的例子没有考虑公司名称中的空格。你知道吗

company1=re.compile(r"""Berkshire Hathaway|Australia & New Zealand Bank
                  |Wells Fargo|AIG
                  |Ind & Comm Bank of China|BNP Paribas""")
company2=re.compile(r"""Berkshire Hathaway|Australia & New Zealand Bank
                  |Wells Fargo|AIG
                  |Ind & Comm Bank of China|BNP Paribas""",re.VERBOSE)
text='AIG is a great company, so is Berkshire Hathaway'  
company1.sub('cmp',text) 
>>> 'AIG is a great company, so is cmp'
company2.sub('cmp',text) 
>>> 'cmp is a great company, so is Berkshire Hathaway'

Tags: textre名称sois公司companybank
1条回答
网友
1楼 · 发布于 2024-05-26 09:18:44

您可以将此视为允许(并忽略)类似换行符的空白的详细模式的示例:

import re

cmp = re.compile(r""" A | B |
                   C | D
               """, re.VERBOSE)
text = 'A is a great company, so is B'
print(cmp.sub('company', text))

输出

company is a great company, so is company

Space is contained in the company names. ... Any idea on how to fix this?

我们需要做一些类似于CGI转义出现在名称中的空格字符的事情。以下是一种基于regex的方法,它不需要对编码的空间进行解码:

import re

companies = re.compile(re.sub(r"(?<=\S) (?=\S)", r"[ ]", """Berkshire Hathaway|Australia & New Zealand Bank
                  |Wells Fargo|AIG
                  |Ind & Comm Bank of China|BNP Paribas"""), re.VERBOSE)

text = 'AIG is a great company, so is Berkshire Hathaway'

print(companies.sub('cmp', text))

输出

cmp is a great company, so is cmp

相关问题 更多 >