基于某些regex断言条件提取多行

2024-03-28 18:20:13 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个大的文本文件,我需要提取特定的数据块的基础上,从一开始在下面几行中发现的特殊条件。如何找到这些块并使用Python regex包提取它们?你知道吗

示例文件(源.txt)如下所示。你知道吗

    .
    .
    .
    Request: 22:11:22
    Discription1: From the Client 1
    Discription2: requesting HTTP
    Version: 1.2
    Type: browsing
    Data: AAAA CFFFF FFF

    Answer: 33:22:44
    Discription1: From Server B
    Discription2: Respons HHTP
    Version: 1.1
    Type: browsing
    Data: kCmkc9AS 9as9 as99 as76d 8aS9d8 6ASDQWv sf

    Request: 31:24:53:33
    Discription1: From Client 2
       Discription2: requesting HTTP
        Version: 1.1

    Type: DASH
    Data: AAAA CFFFF FFF



    Answer: 41:24:33:33
    Discription1: From Server A
    Discription2: Response
    Version: 1.1
    Type: DASH
    Data:ask sef k5q3 WEB 54 fkl n5 qwe@#%@#SDG adkjwra;k4 kfk

    Request: 61:44:23:33
    Discription1: From Client 2
        Discription2: requesting HTTP

    Version: 1.1
       Type: DASH

    Data: AAAA CFFFF FFF
    Data Discription: From the Cleint VM2
    Answer: 71:25:33:33
      Discription1: From Server A
     Discription2: Response
        Version: 1.1
    Type: DASH

    Data:ask sef k5q3 WEB 54 fkl n5 qwe@#%@#SDG adkjwra;k4 kfk
    .
    .

我需要得到以“Request:”开头的块,其特性是:“version 1.1”和“Client 2”


重要注意事项

  1. 块的长度不同,所以它们的长度不一样 信息,但它们有相同的匹配特征。

  2. 它们之间有许多空格和新行。

  3. 匹配的特征可能不完全在特定行中 命令。

  4. 我需要将这些块捕获到以下“Answer”关键字。**


预期输出为:

 Request: 31:24:53:33
    Discription1: From Client 2
    Discription2: requesting HTTP
    Version: 1.1
    Type: DASH
    Data: AAAA CFFFF FFF

    Request: 61:44:23:33
    Discription1: From Client 2
    Discription2: requesting HTTP
    Version: 1.1
    Type: DASH
    Data: AAAA CFFFF FFF
    Data Discription: From the Cleint VM2

Tags: answerfromclienthttpdatafffversionrequest
1条回答
网友
1楼 · 发布于 2024-03-28 18:20:13

您可以使用负lookaheads在下一行断言值:

^Message Request: .*(?:\r?\n(?!.* Client 2|Data:).*)*\r?\n.*Client 2.*(?:\r?\n(?!Version: 1\.1).*)*\r?\nVersion: 1\.1(?:\n(?!Data:).*)*\r?\nData: .*

解释

  • ^行首
  • Message Request: .*匹配消息请求:和行的其余部分
  • (?:\r?\n(?!.* Client 2|Data:).*)*只要行不包含客户机2或以数据开头,就匹配:
  • \r?\n.*Client 2.*将行与客户机2匹配
  • (?:\r?\n(?!Version: 1\.1).*)*只要行不包含版本1.1就匹配
  • \r?\nVersion: 1\.1匹配包含版本1.1的行
  • (?:\n(?!Data:).*)*只要行不是以数据开头就匹配:
  • \r?\nData: .*匹配以数据开头的行:

Regex demo

例如使用re.M

import re
regex = r"^Message Request: .*(?:\r?\n(?!.* Client 2|Data:).*)*\r?\n.*Client 2.*(?:\r?\n(?!Version: 1\.1).*)*\r?\nVersion: 1\.1(?:\n(?!Data:).*)*\r?\nData: .*"

with open("source.txt", "r") as f:
    text1 = f.read()
print (re.findall(regex,text1, re.M))

结果

['Message Request: 31:24:53:33\nDiscription1: From Client 2\nDiscription2: requesting HTTP\nVersion: 1.1\nType: DASH\nData: AAAA CFFFF FFF', 'Message Request: 61:44:23:33\nDescription0:jdfj sdjd\nDiscription1: From Client 2\nDiscription2: requesting HTTP\nVersion: 1.1\nType: DASH\nData: AAAA CFFFF FFF']

相关问题 更多 >