x="""1
17:02,111
Problem report related to
router
2
17:05,223
Restarting the systems
3
18:02,444
Must erase hard disk
now due to compromised data
or something"""
def repl(matchobj):
ll=matchobj.group().split("\n")
return "\n".join(ll[:3])+" "+" ".join(ll[3:])
print re.sub(r"\b\d+\n\d+:\d+,\d+\b[\s\S]*?(?=\n{2}|$)",repl,x)
There may be a faster way if regex is used and it might also be simpler但想用逻辑的方式来做
代码:
inp=open("output.txt","r")
inp=inp.read().split("\n")
print inp
tempString=""
output=[]
w=0
for s in inp:
if s:
if any(c.isalpha() for c in s):
tempString=tempString+" "+s
else:
w=0
if tempString:
output.append(tempString.strip())
tempString=""
output.append(s)
else:
if tempString:
output.append(tempString.strip())
tempString=""
output.append(" ")
if tempString:
output.append(tempString.strip())
print "\n".join(output)
out=open("newoutput.txt","w")
out.write("\n".join(output))
out.close()
输入:
^{pr2}$
输出:
1
17:02,111
Problem report related to 2 router
2
17:05,223
Restarting the systems
3
18:02,444
Must erase hard disk now due to compromised data
4
17:02,111
Problem report related to router
from itertools import tee
import re
with open('ex.txt') as f,open('new.txt','w') as out:
temp,f=tee(f)
next(temp)
try:
for line in f:
if next(temp) !='\n' or re.match(r'^\d{2}:\d{2},\d{3}\s$',pre):
out.write(line)
pre=line
except :
pass
1
17:02,111
Problem report related to
router
another line
2
17:05,223
Restarting the systems
3
18:02,444
Must erase hard disk
now due to compromised data
line 5
line 6
line 7
演示:
def splitter(s):
for x in re.finditer(r"(.*?)(?=\n\n|$)", s,re.DOTALL):
g=x.group(0)
if g:
yield g
import re
with open('ex.txt') as f,open('new.txt','w') as out:
for block in splitter(f.read()):
first,second,third= re.split(r'(\d{2}:\d{2},\d{3}\n)',block)
out.write(first+second+third.replace('\n',' '))
结果:
1
17:02,111
Problem report related to router another line
2
17:05,223
Restarting the systems
3
18:02,444
Must erase hard disk now due to compromised data line 5 line 6 line 7
您可以将
re.sub
与您自己的自定义替换功能一起使用。在当且仅当文件与给定示例一致时,此方法效果良好
注意:
There may be a faster way if regex is used and it might also be simpler
但想用逻辑的方式来做代码:
输入:
^{pr2}$输出:
如果要删除extea行:
为此,如果行后面没有空行,或者行前面应该有一行与后面的regex
^\d{2}:\d{2},\d{3}\s$
匹配的行,则可以为每个like检查2个条件。在因此,为了在每次迭代中访问下一行,您可以使用^{} 从主文件对象创建一个名为
temp
的文件对象,并对其应用next
函数。并使用re.match
匹配正则表达式。在结果:
^{pr2}$如果要将其余部分连接到第三行:
如果您想将第三行后面的其余行连接到第三行,可以使用下面的正则表达式来查找后跟
\n\n
或文件结尾($
)的所有块:然后根据日期格式中的行拆分块,并将部分写入输出文件,但请注意,您需要将第三部分中的新行替换为空格:
在附件公司名称:
演示:
结果:
注:
在这个答案中,
splitter
函数返回一个生成器,当您处理巨大的文件并拒绝在内存中存储不可用的行时,该生成器非常有效。在相关问题 更多 >
编程相关推荐