这是我正在处理的文本示例
6) Jake's Taxi Service is a new entrant to the taxi industry. It has achieved success by staking out a unique position in the industry. How did Jake's Taxi Service mostly likely achieve this position?
A) providing long-distance cab fares at a higher rate than competitors; servicing a larger area than competitors
B) providing long-distance cab fares at a lower rate than competitors; servicing a smaller area than competitors
C) providing long-distance cab fares at a higher rate than competitors; servicing the same area as competitors
D) providing long-distance cab fares at a lower rate than competitors; servicing the same area as competitors
Answer: D
我正在尝试匹配整个问题,包括答案选项。从问题编号到单词答案
这是我当前的正则表达式
((rf'(?<={searchCounter}\) ).*?(?=Answer).*'), re.DOTALL)
SearchCounter只是一个与当前问题相对应的变量,在本例中为6。我认为这个问题与寻找新的路线有关
编辑:完整的源代码
searchCounter = 1
bookDict = {}
with open ('StratMasterKey.txt', 'rt') as myfile:
for line in myfile:
question_pattern = re.compile((rf'(?<={searchCounter}\) ).*?(?=Answer).*'), re.DOTALL)
result = question_pattern.search(line)
if result != None:
bookDict[searchCounter] = result[0]
searchCounter +=1
正则表达式失败的原因是您使用
for line in myfile:
逐行读取文件,而模式在单个多行字符串中搜索匹配项用
contents = myfile.read()
替换for line in myfile:
,然后使用result = question_pattern.search(contents)
获得第一个匹配,或者使用result = question_pattern.findall(contents)
获得多个匹配关于正则表达式的一个注意事项:我没有修复整个模式,因为正如您所提到的,它超出了这个问题的范围,但是由于字符串输入现在是一个多行字符串,您需要删除
re.DOTALL
,并使用[\s\S]
匹配模式中的任何字符,使用.
匹配除换行字符以外的任何字符。此外,lookaround构造是冗余的,您可以安全地将(?=Answer)
替换为Answer
。此外,为了检查是否存在匹配,您可以简单地使用if result:
,然后通过访问result.group()
获取整个匹配值完整代码段:
相关问题 更多 >
编程相关推荐