Python使用循环计算跨多行出现的次数

3条回答

网友

1楼 · 编辑于 2024-05-16 01:04:18

下面是我的解决方案，它使用带计数器和regex with lookahead的defaultdict。在

import re
from collections import Counter, defaultdict

with open('in.txt', 'r') as f:
    # make sure we have only \n as lineend and no leading or trailing whitespaces
    # this makes the regex less complex
    alltext = '\n'.join(line.strip() for line in f)

# find keyword script\d+ and capture it, then lazy expand and capture everything
# with lookahead so that we stop as soon as and only if next word is 'script' or
# end of the string
scriptPattern = re.compile(r'(script\d+)(.*?)(?=script|\n?$)', re.DOTALL)

# just find everything that matches launch\d+
launchPattern = re.compile(r'launch\d+')

# create a defaultdict with a counter for every entry
scriptDict = defaultdict(Counter)

# go through all matches
for match in scriptPattern.finditer(alltext):
    script, body = match.groups()
    # update the counter of this script
    scriptDict[script].update(launchPattern.findall(body))

# print the results
for script in sorted(scriptDict):
    counter = scriptDict[script]
    if len(counter):
        print('{} launches:'.format(script))
        for launch in sorted(counter):
            count = counter[launch]
            print('\t{} {} time(s)'.format(launch, count))
    else:
        print('{} launches nothing'.format(script))

使用regex101上的字符串（参见上面的链接），我得到以下结果：

^{pr2}$

网友

2楼 · 编辑于 2024-05-16 01:04:18

为什么不使用多行正则表达式-然后脚本变成：

import re

# read all the text of the file, and clean it up
with open('counts.txt', 'rt') as f:
    alltext = '\n'.join(line.strip() for line in f)

# find all occurrences of the script line followed by the launch line
cont = re.findall('^script(\d)\nlaunch(\d+)\.VBS\n(?mi)',alltext)

# accumulate the counts of each launch number for each script number
# into nested dictionaries
scriptcounts = {}
for scriptnum,launchnum in cont:
    # if we haven't seen this scriptnumber before, create the dictionary for it
    if scriptnum not in scriptcounts:
        scriptcounts[scriptnum]={}
    # if we haven't seen this launchnumber with this scriptnumber before,
    # initialize count to 0
    if launchnum not in scriptcounts[scriptnum]:
        scriptcounts[scriptnum][launchnum] = 0
    # incremement the count for this combination of script and launch number
    scriptcounts[scriptnum][launchnum] += 1

# produce the output in order of increasing scriptnum/launchnum
for scriptnum in sorted(scriptcounts.keys()):
    for launchnum in sorted(scriptcounts[scriptnum].keys()):
        print "script%s\nlaunch%s.VBS\n# count %d\n"%(scriptnum,launchnum,scriptcounts[scriptnum][launchnum])

输出（以您要求的格式）是，例如：

^{pr2}$

在关于芬德尔（）返回所有匹配项的列表-每个匹配项都是模式中（）部分的列表，除了（？）？mi）这是一个指令，用于告诉正则表达式匹配器跨行结束\n并匹配不区分大小写的匹配。regex模式（如fragment'script（\d）'将脚本/启动后的数字拉入匹配-这可以很容易地包括'script'（script\d）'，类似的是'（launch\d+\.VBS'），只有打印需要修改才能处理这种变化。在

高温巴尼

网友

3楼 · 编辑于 2024-05-16 01:04:18

下面是一个使用嵌套字典的方法。如果您希望输出的格式不同，请告诉我：

#!/usr/bin/env python3

import re
script_dict={}
with open('infile.txt','r') as infile:
    scriptre = re.compile(r"^script\d+$")
    for line in infile:
        line = line.rstrip()
        if scriptre.match(line) is not None:
            script_dict[line] = {}

    infile.seek(0) # go to beginning
    launchre = re.compile(r"^launch\d+\.[vV][bB][sS]$")
    current=None
    for line in infile:
        line = line.rstrip()
        if line in script_dict:
            current=line
        elif launchre.match(line) is not None and current is not None:
            if line not in script_dict[current]:
                script_dict[current][line] = 1 
            else:
                script_dict[current][line] += 1

print(script_dict)

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python使用循环计算跨多行出现的次数

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >