使用Python替换文本文件中的多行内容
我知道怎么在Python中替换字符串,但我在处理一个文本块时遇到了困难,可能是因为我想替换的不是一行,而是一段文字。
我有一堆文本文件,其中有一段相同的文字在多个地方重复出现:
LIVEBLAH Information Provided By: BLAH ONLINE A division of Blahdeblah BlahBlah Information, Inc. Washington, DC New York, NY Chicago, IL Los Angeles, CA Miami, FL Dallas, TX For Additional Information About LIVEBLAH, Call 1-800-XXX-XXXX or Visit Us on the World Wide Web at http://www.blahdeblah.com
我想把这段文字的每个出现都替换成“start body”。
这是我正在尝试的代码:
import os,glob
path = 'files'
key="""
LIVEBLAH Information Provided By:
BLAH ONLINE
A division of Blahdeblah BlahBlah Information, Inc.
Washington, DC New York, NY Chicago, IL
Los Angeles, CA Miami, FL Dallas, TX
For Additional Information About LIVEBLAH, Call
1-800-XXX-XXXX
or Visit Us on the World Wide Web at
http://www.blahdeblah.com"""
for filename in glob.glob(os.path.join(path, '*.txt')):
with open(filename, 'r') as f:
# read entire file into file1
file1 = f.read()
# replace block of text with proper string
file1 = file1.replace(key, "start body")
# write into a new file
with open(filename+'_new', 'w') as f:
f.write(file1)
有人能告诉我为什么replace()方法在处理文本块时不起作用吗?我该怎么做才能让它正常工作?
编辑 -- 我尝试了另一种方法:
for filename in glob.glob(os.path.join(path, '*.txt_new_NEW_NEW_BLAH')):
with open(filename, 'r') as f:
# read entire file into file1
file1 = f.read()
# index() will raise an error if not found
f1_start = file1.index('LIVEBLAH Information Provided By:')
f1_end = file1.index('http://www.blahdeblah.com', f1_start)
key = file1[f1_start:(f1_end+25)] # 25 is the length of the string 'http://www.blahdeblah.com'
file1 = file1.replace(key, '\n'+"start body")
with open(filename+'_TRIAL', 'w') as f:
f.write(file1)
这给出了奇怪的结果——对于某些文件,它工作得很好。对于其他文件,它只把字符串'LIVEBLAH Information Provided By:'替换成'start body',但其余的文本块却保持不变。还有一些文件,index()方法报错说找不到字符串'LIVEBLAH Information Provided By:',尽管它明明就在那儿。到底发生了什么?
1 个回答
0
因为制表符和换行符会被编码成'\t'和'\n'或者'\r'(这取决于你使用的操作系统或文件编辑器),所以我建议你先获取文本文件的unicode转储,然后在替换命令中使用这个字符串。否则,你可能会把制表符误解为多个空格等等。