如何只打印符合python模式的数据文件的某一部分

####<Jun 4, 2016 12:05:50 PM IST> <Debug> <MessagingBridgeRuntimeVerbose> <ggneai29> <AircelESB_MS1> <[ACTIVE] ExecuteThread: '13' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1465022150722> <BEA-000000> <Bridge NPGBridge doTrigger(): state = 4 stopped = false> ####<Jun 4, 2016 12:05:50 PM IST> <Error> <ALSB Logging> <ggneai29> <AircelESB_MS1> <[ACTIVE] ExecuteThread: '13' for queue: 'weblogic.kernel.Default (self-tuning)'> <<anonymous>> <> <> <1465022150886> <BEA-000000> < [PipelinePairNode1, PipelinePairNode1_request, CreateVASReportingStage, REQUEST] *** CreateVASWrapper Reprting Stage VAS V-3.0 ***: <soap:Body xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <VASProxyType xmlns="http://xmlns.aircel.com/AircelTransformation/ProxyService/OrderProxy/1.0/CreateVASSubscriptionConsumerSchema"> <TransactionId>DATA030620160431128801011429ADD</TransactionId> <msisdn>8801011429</msisdn> <productCode>DATA</productCode> <action>ADD</action> <IMSI>405801124044563</IMSI> <SubsType>PrePaid</SubsType> </VASProxyType> </soap:Body>> ####<Jun 4, 2016 12:05:50 PM IST> <Error> <ALSB Logging> <ggneai29> <AircelESB_MS1> <[ACTIVE] ExecuteThread: '13' for queue: 'weblogic.kernel.Default (self-tuning)'> <<anonymous>> <> <> <1465022150889> <BEA-000000> < [PipelinePairNode1, PipelinePairNode1_request, Authentication, REQUEST] ***REQUEST FOR VAS V-3.0 ****: <soap:Body xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <VASProxyType xmlns="http://xmlns.aircel.com/AircelTransformation/ProxyService/OrderProxy/1.0/CreateVASSubscriptionConsumerSchema"> <TransactionId>DATA030620160431128801011429ADD</TransactionId>

2条回答

网友

1楼 · 编辑于 2024-05-13 23:56:18

正如在注释中所建议的：您的XML无效。最好确保XML有效，然后使用像[etree][1]或[Beautiful Soup][2]这样的解析器。你知道吗

但是如果你想使用regex，你可以尝试：

import re

mytext = [
    '####<Jun 4, 2016 12:05:50 PM IST> <Debug> <MessagingBridgeRuntimeVerbose> <ggneai29> <AircelESB_MS1> <[ACTIVE] ExecuteThread: \'13\' for queue: \'weblogic.kernel.Default (self-tuning)\'> <<WLS Kernel>> <> <> <1465022150722> <BEA-000000> <Bridge NPGBridge doTrigger(): state = 4 stopped = false>',
    '####<Jun 4, 2016 12:05:50 PM IST> <Error> <ALSB Logging> <ggneai29> <AircelESB_MS1> <[ACTIVE] ExecuteThread: \'13\' for queue: \'weblogic.kernel.Default (self-tuning)\'> <<anonymous>> <> <> <1465022150886> <BEA-000000> < [PipelinePairNode1, PipelinePairNode1_request, CreateVASReportingStage, REQUEST] *** CreateVASWrapper Reprting Stage VAS V-3.0 ***: <soap:Body xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">',
    '<VASProxyType xmlns="http://xmlns.aircel.com/AircelTransformation/ProxyService/OrderProxy/1.0/CreateVASSubscriptionConsumerSchema">',
    '    <TransactionId>DATA030620160431128801011429ADD</TransactionId>',
    '    <msisdn>8801011429</msisdn>',
    '    <productCode>DATA</productCode>',
    '    <action>ADD</action>',
    '    <IMSI>405801124044563</IMSI>',
    '    <SubsType>PrePaid</SubsType>',
    '</VASProxyType>',
    '</soap:Body>',
    '<Jun 4, 2016 12:05:50 PM IST> <Error> <ALSB Logging> <ggneai29> <AircelESB_MS1> <[ACTIVE] ExecuteThread: \'13\' for queue: \'weblogic.kernel.Default (self-tuning)\'> <<anonymous>> <> <> <1465022150889> <BEA-000000> < [PipelinePairNode1, PipelinePairNode1_request, Authentication, REQUEST] ***REQUEST FOR VAS V-3.0 ****: <soap:Body xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">',
    '    <VASProxyType xmlns="http://xmlns.aircel.com/AircelTransformation/ProxyService/OrderProxy/1.0/CreateVASSubscriptionConsumerSchema">',
    '        <TransactionId>DATA030620160431128801011429ADD</TransactionId>',
]

searches = [
    {
       "if_in": "<[ACTIVE] ExecuteThread:",
       "search": "<\[ACTIVE[^<>]+> <<WLS Kernel>> <> <> <\d+>",
    },
    {
        "if_in": "PipelinePairNode1, PipelinePairNode1_request, Create",
        "search": "< \[PipelinePairNode1, PipelinePairNode1_request, Create[^\[\]]+\]",
    },
    {
        "if_in": "CreateVASWrapper Reprting Stage VAS",
        "search": "CreateVASWrapper Reprting Stage VAS[^*]+",
    },
    {
        "if_in": "<TransactionId>",
        "search": "(?<=<TransactionId>)[^<>]+",
    },
    {
        "if_in": "<msisdn>",
        "search": "(?<=<msisdn>)[^<>]+",
    },
    {
        "if_in": "<action>",
        "search": "(?<=<action>)[^<>]+",
    },
    {
        "if_in": "<IMSI>",
        "search": "(?<=<IMSI>)[^<>]+",
    },
    {
        "if_in": "<SubsType>",
        "search": "(?<=<SubsType>)[^<>]+",
    },
]

result = ""
found_once = []

for item in mytext:
    for search in searches:
        if search['if_in'] in item and search['if_in'] not in found_once:
            f = re.findall(search['search'], item)
            if f:
                result += f[0] + " "
                found_once.append(search['if_in'])

print result

如果您想找到其他内容，请将其添加到searches。你知道吗

结果是：

<[ACTIVE] ExecuteThread: '13' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1465022150722> < [PipelinePairNode1, PipelinePairNode1_request, CreateVASReportingStage, REQUEST] CreateVASWrapper Reprting Stage VAS V-3.0  DATA030620160431128801011429ADD 8801011429 ADD 405801124044563 PrePaid

网友

2楼 · 编辑于 2024-05-13 23:56:18

处理此类问题的标准方法是编写某种“基于事件”的解析器（如SAXXML解析器……）：解析器逐行读取文件（不需要读取内存中的全部内容），根据自己的规则扫描行（这就是您可能要使用regexps的地方，但有时纯字符串方法也同样有效），并且根据行内容的不同，会发出一个给定的“事件”（将由回调方法处理）和相关数据。你知道吗

在您的例子中，会有一个事件，用于开始一个有趣的数据块的行（以“#######”开头的行），另一个事件用于包含xml数据的行，还有一个事件用于块的最后一行（包含“”的行）-类似于这样：

class Parser(object):

    def parse(self, logfile):
        self.in_block = False
        for line in logfile:
            if self.is_block_start(line):
                self.in_block = True
                self.handle_block_start(line)
            elif self.in_block:
                if self.is_data(line):
                    self.handle_data(line)
                elif self.is_block_end(line):
                    self.in_block = False
                    self.handle_block_end(line)
            else:
                continue

    def is_block_start(self):
        # your code here

    def is_data(self):
        # your code here

    def is_block_end(self):
        # your code here

    def handle_block_start(self, line):
        # your code here

    def handle_data(self, line):
        # your code here

    def handle_block_end(self, line):
        # your code here

相关问题更多 >

编程相关推荐

热门问题

热门文章