使用re.findall（）替换所有匹配项

import json import re regex = re.compile('([a-zA-Z]\"[a-zA-Z])', re.S) filepath = "C:\\Python27\\Customer Stuff\\Austin Tweets.txt" f = open(filepath, 'r') myfile = re.findall(regex, '([a-zA-Z]\%[a-zA-Z])', f.read()) print myfile

Traceback (most recent call last): File "C:/Python27/Customer Stuff/Austin's Script.py", line 9, in <module> myfile = re.findall(regex, '([a-zA-Z]\%[a-zA-Z])', f.read()) File "C:\Python27\lib\re.py", line 177, in findall return _compile(pattern, flags).findall(string) File "C:\Python27\lib\re.py", line 229, in _compile bypass_cache = flags & DEBUG TypeError: unsupported operand type(s) for &: 'str' and 'int'

import json import re regex = re.compile('([a-zA-Z]\"[a-zA-Z])', re.S) regex2 = re.compile('([a-zA-Z]%[a-zA-Z])', re.S) filepath = "C:\\Python27\\Customer Stuff\\Austin Tweets.txt" f = open(filepath, 'r') myfile = f.read() myfile2 = re.sub(regex, regex2, myfile) print myfile

Traceback (most recent call last): File "C:/Python27/Customer Stuff/Austin's Script.py", line 11, in <module> myfile2 = re.sub(regex, regex2, myfile) File "C:\Python27\lib\re.py", line 151, in sub return _compile(pattern, flags).sub(repl, string, count) File "C:\Python27\lib\re.py", line 273, in _subx template = _compile_repl(template, pattern) File "C:\Python27\lib\re.py", line 258, in _compile_repl p = sre_parse.parse_template(repl, pattern) File "C:\Python27\lib\sre_parse.py", line 706, in parse_template s = Tokenizer(source) File "C:\Python27\lib\sre_parse.py", line 181, in __init__ self.__next() File "C:\Python27\lib\sre_parse.py", line 183, in __next if self.index >= len(self.string): TypeError: object of type '_sre.SRE_Pattern' has no len()

3条回答

网友

1楼 · 编辑于 2024-05-23 18:37:28

import re

regex = re.compile('([a-zA-Z]\"[a-zA-Z])', re.S)
myfile =  'foo"s bar'
myfile2 = regex.sub(lambda m: m.group().replace('"',"%",1), myfile)
print(myfile2)

网友

2楼 · 编辑于 2024-05-23 18:37:28

如注释所示，使用^{}：

myfile = re.sub(regex, replacement, f.read())

其中，replacement是匹配项将被替换的字符串。

网友

3楼 · 编辑于 2024-05-23 18:37:28

如果我正确地理解了你的问题，你正在尝试用两个字符之间的百分号替换两个字符之间的引号。

有几种方法可以使用re.sub（re.findall根本不进行替换，因此您的初始尝试总是注定要失败）。

一个简单的方法是更改模式，将字母单独分组，然后使用包含回溯引用的替换字符串：

pattern = re.compile('([a-zA-Z])\"([a-zA-Z])', re.S)
re.sub(pattern, r'\1%\2', text)

另一种选择是使用替换函数而不是替换字符串。对于在文本中找到的每个匹配项，将使用match对象调用该函数，其返回值是替换值：

pattern = re.compile('[a-zA-Z]\"[a-zA-Z]', re.S)
re.sub(pattern, lambda match: "{0}%{2}".format(*match.group()), text)

（可能还有很多实现lambda函数的其他方法。我喜欢字符串格式。）

然而，最好的方法可能是在模式中使用lookahead和lookbehind，以确保引号位于字母之间，而不是实际匹配这些字母。这允许您使用普通字符串'%'作为替换：

pattern = re.compile('(?<=[a-zA-Z])\"(?=[a-zA-Z])', re.S)
re.sub(pattern, '%', text)

与其他版本相比，它的语义确实略有不同。像'a"b"c'这样的文本将同时替换两个引号，而前面的代码将只替换第一个。希望这是一个进步！

相关问题更多 >

编程相关推荐

热门问题

热门文章