使用bash或python从文件中提取行

2024-04-25 14:19:14 发布

您现在位置:Python中文网/ 问答频道 /正文

这是我的文件内容,它是pflogsumm的输出

Host/Domain Summary: Messages Received 
---------------------------------------
 msg cnt   bytes   host/domain
 -------- -------  -----------
    415     5416k  abc.com
     13    19072   xyz.localdomain

Senders by message count
------------------------
    415   alert@example.com
     13   root@jelly.localdomain

Recipients by message count
---------------------------
    506   alert@apple.com            <= Extract from here to ...
     70   info@pafpro.org.us
     ..
     ...
     19   gems@gmail.com
     17   info@aol.com
     13   hemdem@gmail.com           <= Extract ends here

Senders by message size
-----------------------
   5416k  alert@google.com
...
 ...

输出中的信息字段似乎用“title”和“新行”隔开。例如Recipients by message count ...<contents of interest> ... NewLine 我尝试使用below sed表达式,但它在匹配字符串"Recipients by message count"后返回所有行

sed -nr '/.*Recipients by message count/,/\n/ p'

所需输出:"Recipients by message count"下的所有电子邮件


Tags: 文件infocom内容messagebyherecount
3条回答

使用awk:

awk '/Recipients by message count/{p=1}!$0{p=0}p' input_file

将按邮件计数打印收件人

分解:

/Recipients by message count/ {p=1} # When /pattern/ is matched set p = 1
!$0 {p=0}                           # When input line is empty set p = 0
p                                   # Print line if p is true, short for:
                                    # p { print $0 }
$ sed -n '/Recipients by message count/,/^\s*$/ p' data | sed -n '1!{2!{$!p}}'
    506   alert@apple.com            <= Extracter from here to ...
     70   info@pafpro.org.us
     ..
     ...
     19   gems@gmail.com
     17   info@aol.com
     13   hemdem@gmail.com           <= Extract ends here

像这样:

    findthis = "Recipients by message count"

    with open("tst.dat") as f:
      while True:
        line = f.readline()
        if not line: break

        if not findthis in line:
          continue
        line = f.readline()

        while True:
          line = f.readline()
          if not line: break
          line = line.rstrip()     ## get rid of whitespace
          if line == "":           ## empty line
            break
          print(line)

如果文件很大或使用通配符搜索,请使用正则表达式库。你知道吗

相关问题 更多 >