正则表达式只提取Customerid和数据(字节)并保存在列表中?

2024-04-25 05:16:48 发布

您现在位置:Python中文网/ 问答频道 /正文

我有下面的日志

01-01-2012 01:13:36 Blah blah : blah CustomerId:1234 downloaded Blah Size:5432 bytes Carrier:Company-A   
01-01-2012 01:13:36 Blah blah : blah CustomerId:1237 downloaded Blah Size:5432 bytes Carrier:Company-B  

有人能告诉我一个正则表达式来提取客户id和大小,保存在列表中,并打印每个客户id下载的数据量吗?我可以使用Python中的搜索和字典来实现这一点。要求你们提供正则表达式。你知道吗


Tags: id列表size客户字典bytescompanyblah
3条回答

在本例中,我使用了在data.txt输入测试文件中粘贴了三次的两行输入数据:

Python:

import re

data = {}
regex = re.compile(r'CustomerId:(\d+).*?Size:(\d+)');

with open('data.txt') as fh:
    for line in fh:
        m = regex.search(line)

        if (m.group(1) and m.group(2)):

            cust = m.group(1)
            size = m.group(2)

            try:
                data[cust] += int(size) 
            except KeyError:
                data[cust] = int(size)

print(str(data))

输出:

{'1234': 16296, '1237': 16296}

Perl语言:

use warnings;
use strict;

use Data::Dumper;

open my $fh, '<', 'data.txt' or die $!;

my %data;

while (my $line = <$fh>){
    if (my ($cust, $size) = $line =~ /CustomerId:(\d+).*?Size:(\d+)/){
        $data{$cust} += $size;
    }
}

print Dumper \%data;

输出:

$VAR1 = {
      '1234' => 16296,
      '1237' => 16296
};

下面是我要做的:

In [1]: import collections, re

In [2]: d = collections.defaultdict(list)

In [3]: string = "01-01-2012 01:13:36 Blah blah : blah CustomerId:1234 downloaded Blah Size:5432 bytes Carrier:Company-A\n01-01-2012 01:13:36 Blah blah : blah CustomerId:1237 downloaded Bla
    ...: h Size:5432 bytes Carrier:Company-B"

In [4]: for cust_id, sz in re.findall(r".*CustomerId\:(\d+).*Size:(\d+)", string):
    ...:     d[cust_id].append(sz)
    ...:

In [5]: d
Out[5]: defaultdict(list, {'1234': ['5432'], '1237': ['5432']})
#!/usr/bin/python

import re

res = dict()

data = open("log.txt").readlines()

for line in data:
    m = re.search("CustomerId:([0-9]+).*Size:([0-9]+)", line)
    cid = int(m.group(1))
    siz = int(m.group(2))
    if not res.has_key(cid):
        res[cid] = 0
    res[cid] += siz

for cust in res.keys():
    print "Customer ID %d - %d bytes" % (cust, res[cust])

相关问题 更多 >