Python在文本文件中搜索特定的时间范围(sedn等价)

2024-04-27 03:29:33 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图制作一个python脚本,从日志文件输出特定的时间范围(类似于下面列出的sed命令):

sed -n '/2017-01-26 18:00/ , /2017-01-26 18:02/p' /logfile.log
^{pr2}$

我的python脚本正在搜索固定字符串,而不是上面的sed命令(我怀疑我做错了什么,但找不到错误-请检查下面的代码):

请指出代码应该在哪里更改,并建议代码增强。谢谢!在

#!/usr/bin/python
import datetime, time, os, sys, re
from datetime import timedelta
counter = 0
avgtime = 0

now = datetime.datetime.utcnow()
pasttime = now - datetime.timedelta(minutes=5)

timestamp = now.strftime("%y%m%d")
fiveago   = now - timedelta(minutes=5,seconds=now.second)
current   = now.strftime("%Y-%m-%d %H:%M")
pasttime  = fiveago.strftime("%Y-%m-%d %H:%M")
pattern   = str(current + "|" + pasttime)

f = open('/logs/' + sys.argv[1] + '/' + 'u_ex' + timestamp + '.log', 'r')
for line in f:
        if "POST" in line:
                if re.search(pattern, line, re.IGNORECASE):
                        date = line.split(' ')[1]
                        time = line.split(' ')[14]
                        avgtime += int(time)
                        counter += 1
                        print(date,time)
f.close()

print(pattern)
print("Total amount of time: ",counter)
print("Total scan time: ",avgtime)
print("Average scan time: ",avgtime / counter)

Tags: 代码re脚本datetimetimelinecounternow
3条回答

IIUC,你需要从日志中输入你经过的时间戳。在

import datetime, time, os, sys, re
from datetime import timedelta
counter = 0
avgtime = 0

now = datetime.datetime.utcnow()
pasttime = now - datetime.timedelta(minutes=100000)

timestamp = now.strftime("%y%m%d")
fiveago   = now - timedelta(minutes=5,seconds=now.second)
current   = now.strftime("%Y-%m-%d %H:%M")
pasttime  = fiveago.strftime("%Y-%m-%d %H:%M")
pattern   = str(current + "|" + pasttime)

print "Start time: ", pasttime ,"End time: ",current ,"\n\n"

filename ='/logs/' + sys.argv[1] + '/' + 'u_ex' + timestamp + '.log'
with open(filename, 'r') as f:
    contents = f.readlines()
for line in contents:
    if "POST" in line:
        date = line.split(' ')[1]
        time = line.split(' ')[14]
        logdatetime=date+" "+time

        if logdatetime <= current and logdatetime >= pasttime:
            print "yes, within the interval : " ,logdatetime

输出

^{pr2}$

用于此的输入

POST 2017-01-26 20:23:20 XX
POST 2017-01-26 20:23:01 XC
POST 2017-01-26 20:23:02 CV
POST 2017-01-26 20:20:09 DAF
POST 2017-01-26 20:20:09 fASF
POST 2017-01-26 20:20:11 Sfas
POST 2017-01-26 20:20:01 fsAf
POST 2017-01-26 20:20:02 asf
POST 2017-01-26 20:20:03 asf

你的解决方案的问题是你只寻找两个“边缘时间”。在您的3分钟时间范围示例中,这是18:00和{}。在

sed命令的作用是:

sed -n '/2017-01-26 18:00/ , /2017-01-26 18:02/p' /logfile.log
  1. 迭代行而不打印(-n
  2. 只要sed找到2017-01-26 18:00,它就开始打印所有行
  3. 每当sed找到2017-01-26 18:02时,它就会停止打印

在您的示例中,regex模式是:

^{pr2}$

只会找到或者18:00或者18:02。所以,你可以做的是:

  1. 分析行外的日期并与时间范围进行比较,如Shijos answer
  2. 模拟sed,如theamks answer,但要注意:只有在文件中同时存在两个“边缘时间戳”时,这才有效
  3. 拉皮条客你的正则表达式,这样它也可以搜索时间之间的时间:

    pattern = "|".join([(now-timedelta(minutes=i)).strftime("%Y-%m-%d %H:%M") for i in range(6)])
    

    这将产生例如:

    '2016-01-26 18:00|2016-01-26 17:59|2016-01-26 17:58|2016-01-26 17:57|2016-01-26 17:56|2016-01-26 17:55'
    

我看不出问题出在哪里,但您要求的sed相当于您的命令,所以下面是python的确切翻译:

import sys, re
use = False
for line in open('/logfile.log'):
   if re.search('2017-01-26 18:00', line): use = True
   if use: sys.stdout.write(line)
   if re.search('2017-01-26 18:02', line): use = False

相关问题 更多 >