匹配给定要求的正则表达式

2024-06-16 10:55:25 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个字符串:

s = '((FILTER( "SalesVelocity"."OrderHeader"."#Opportunities" 
 USING "SalesVelocity"."Opportunity_1"."OpportunityStatusCategory"=''OPEN''))*
 (FILTER("SalesVelocity"."OrderHeader"."OpportunityRevenue" 
 USING "SalesVelocity"."Opportunity_1"."OpportunityStatusCategory"=''OPEN'')/
 FILTER("SalesVelocity"."OrderHeader"."#Opportunities" 
 USING "SalesVelocity"."Opportunity_1"."OpportunityStatusCategory"=''OPEN''))*
 (FILTER ( "SalesVelocity"."OrderHeader"."#Opportunities" 
 USING "SalesVelocity"."Opportunity_1"."OpportunityStatusCategory" = ''WON'') / 
 "SalesVelocity"."OrderHeader"."#Opportunities" ))/
 ((1.0 * "SalesVelocity"."OrderHeader"."TotalSalesCycleOppty")  / 
 FILTER("SalesVelocity"."OrderHeader"."#Opportunities" 
 USING "SalesVelocity"."Opportunity_1"."OpportunityStatusCategory" = ''WON''))'

它将表引用为"SchemaName"."TableName"."ColumnName" 我需要提取所有表的信息与架构为 "SalesVelocity"."OrderHeader" "SalesVelocity"."Opportunity_1"

import re
pat = r'".*?\"\.".*?\"'             #See Note at the bottom of the answer
s = '((FILTER( "SalesVelocity"."OrderHeader"."#Opportunities" 
 USING "SalesVelocity"."Opportunity_1"."OpportunityStatusCategory"=''OPEN''))*
 (FILTER("SalesVelocity"."OrderHeader"."OpportunityRevenue" 
 USING "SalesVelocity"."Opportunity_1"."OpportunityStatusCategory"=''OPEN'')/
 FILTER("SalesVelocity"."OrderHeader"."#Opportunities" 
 USING "SalesVelocity"."Opportunity_1"."OpportunityStatusCategory"=''OPEN''))*
 (FILTER ( "SalesVelocity"."OrderHeader"."#Opportunities" 
 USING "SalesVelocity"."Opportunity_1"."OpportunityStatusCategory" = ''WON'') / 
 "SalesVelocity"."OrderHeader"."#Opportunities" ))/
 ((1.0 * "SalesVelocity"."OrderHeader"."TotalSalesCycleOppty")  / 
 FILTER("SalesVelocity"."OrderHeader"."#Opportunities" 
 USING "SalesVelocity"."Opportunity_1"."OpportunityStatusCategory" = ''WON''))'
match1 = re.findall(pat, s)
print(match1)

其输出为:

['"SalesVelocity"."OrderHeader"', 
'"#Opportunities" USING "SalesVelocity"."Opportunity_1"', 
'"OpportunityStatusCategory"=OPEN))*(FILTER("SalesVelocity"."OrderHeader"', 
'"OpportunityRevenue" USING "SalesVelocity"."Opportunity_1"', 
'"OpportunityStatusCategory"=OPEN)/FILTER("SalesVelocity"."OrderHeader"', 
'"#Opportunities" USING "SalesVelocity"."Opportunity_1"', 
'"OpportunityStatusCategory"=OPEN))*(FILTER ("SalesVelocity"."OrderHeader"', 
'"#Opportunities" USING "SalesVelocity"."Opportunity_1"', 
'"OpportunityStatusCategory" = WON) / "SalesVelocity"."OrderHeader"', 
'"#Opportunities" ))/((1.0 * "SalesVelocity"."OrderHeader"', 
'"TotalSalesCycleOppty")  / FILTER("SalesVelocity"."OrderHeader"', 
'"#Opportunities" USING "SalesVelocity"."Opportunity_1"']

这是不正确的,例如第二个值:

('"#Opportunities" USING "SalesVelocity"."Opportunity_1"')

我的表达式检查是以“then.开头的?对于所有字符,直到它达到\“然后点,然后再”然后。?对于所有字符,直到达到\“

我错过了什么?你知道吗


Tags: thereopenfilterusingpatwonopportunity
3条回答

这就是你需要的吗?你知道吗

import re

s = '''((FILTER( "SalesVelocity"."OrderHeader"."#Opportunities" USING "SalesVelocity"."Opportunity_1"."OpportunityStatusCategory"='OPEN'))*(FILTER("SalesVelocity"."OrderHeader"."OpportunityRevenue" USING "SalesVelocity"."Opportunity_1"."OpportunityStatusCategory"='OPEN')/FILTER("SalesVelocity"."OrderHeader"."#Opportunities" USING "SalesVelocity"."Opportunity_1"."OpportunityStatusCategory"='OPEN'))*(FILTER ( "SalesVelocity"."OrderHeader"."#Opportunities" USING "SalesVelocity"."Opportunity_1"."OpportunityStatusCategory" = 'WON') / "SalesVelocity"."OrderHeader"."#Opportunities" ))/((1.0 * "SalesVelocity"."OrderHeader"."TotalSalesCycleOppty")  / FILTER("SalesVelocity"."OrderHeader"."#Opportunities" USING "SalesVelocity"."Opportunity_1"."OpportunityStatusCategory" = 'WON'))'''

pat = r'"[^"]*?"\."[^"]*?"'             #See Note at the bottom of the answer

match1 = re.findall(pat, s)
print(match1)

输出:

['"SalesVelocity"."OrderHeader"', '"SalesVelocity"."Opportunity_1"', '"SalesVelocity"."OrderHeader"', '"SalesVelocity"."Opportunity_1"', '"SalesVelocity"."OrderHeader"', '"SalesVelocity"."Opportunity_1"', '"SalesVelocity"."OrderHeader"', '"SalesVelocity"."Opportunity_1"', '"SalesVelocity"."OrderHeader"', '"SalesVelocity"."OrderHeader"', '"SalesVelocity"."OrderHeader"', '"SalesVelocity"."Opportunity_1"']

下面的正则表达式(根据Wiktor Stribizew的建议进行了修改)应该有效:"[^"]*?\"\."[^"]*\"

了解回溯

  • ".*?"匹配next ""之间的最短子序列
  • ".*?"\.匹配next "".之间的最短子序列

所以".*?"\.匹配"#Opportunities" USING "SalesVelocity".,因为在"\.匹配失败后,它会回溯到.*

消极的前瞻更具表现力,因为它精确地指定了不需要的标记

"(?:(?!").)*"\.".*?"

另一个解决方法是在“.*?”周围使用原子群你知道吗

"(?>.*?")\.".*?"

但在您的例子中,使用负字符更有效:[^"]*,因为它避免了回溯。你知道吗

相关问题 更多 >