Python使用循环计算跨多行出现的次数

2024-05-16 01:04:18 发布

您现在位置:Python中文网/ 问答频道 /正文

我想要一个快速的Python方法来给我一个循环计数。事实上,我实在是太尴尬了,不能发布我的解决方案,因为这些方案目前不起作用。在

给定一个文本文件的示例,其结构如下:

script7 BLANK INTERRUPTION script2 launch4.VBS script3 script8 launch3.VBS script5 launch1.VBS script6

我要计算脚本[y]后面是启动[X]的所有时间。Launch的值范围是1-5,而script的范围是1-15。在

以script3为例,我需要一个给定文件中以下各项的计数:

script3
launch1
#count this

script3
launch2
#count this

script3
launch3
#count this

script3
launch4
#count this

script3
launch4
#count this

script3
launch5
#count this

我认为这里涉及的循环的数量已经超过了我对Python的了解。如有任何帮助,我们将不胜感激。在


Tags: 方法示例count方案解决方案this结构计数
3条回答

下面是我的解决方案,它使用带计数器和regex with lookahead的defaultdict。在

import re
from collections import Counter, defaultdict

with open('in.txt', 'r') as f:
    # make sure we have only \n as lineend and no leading or trailing whitespaces
    # this makes the regex less complex
    alltext = '\n'.join(line.strip() for line in f)

# find keyword script\d+ and capture it, then lazy expand and capture everything
# with lookahead so that we stop as soon as and only if next word is 'script' or
# end of the string
scriptPattern = re.compile(r'(script\d+)(.*?)(?=script|\n?$)', re.DOTALL)

# just find everything that matches launch\d+
launchPattern = re.compile(r'launch\d+')

# create a defaultdict with a counter for every entry
scriptDict = defaultdict(Counter)

# go through all matches
for match in scriptPattern.finditer(alltext):
    script, body = match.groups()
    # update the counter of this script
    scriptDict[script].update(launchPattern.findall(body))

# print the results
for script in sorted(scriptDict):
    counter = scriptDict[script]
    if len(counter):
        print('{} launches:'.format(script))
        for launch in sorted(counter):
            count = counter[launch]
            print('\t{} {} time(s)'.format(launch, count))
    else:
        print('{} launches nothing'.format(script))

使用regex101上的字符串(参见上面的链接),我得到以下结果:

^{pr2}$

为什么不使用多行正则表达式-然后脚本变成:

import re

# read all the text of the file, and clean it up
with open('counts.txt', 'rt') as f:
    alltext = '\n'.join(line.strip() for line in f)

# find all occurrences of the script line followed by the launch line
cont = re.findall('^script(\d)\nlaunch(\d+)\.VBS\n(?mi)',alltext)

# accumulate the counts of each launch number for each script number
# into nested dictionaries
scriptcounts = {}
for scriptnum,launchnum in cont:
    # if we haven't seen this scriptnumber before, create the dictionary for it
    if scriptnum not in scriptcounts:
        scriptcounts[scriptnum]={}
    # if we haven't seen this launchnumber with this scriptnumber before,
    # initialize count to 0
    if launchnum not in scriptcounts[scriptnum]:
        scriptcounts[scriptnum][launchnum] = 0
    # incremement the count for this combination of script and launch number
    scriptcounts[scriptnum][launchnum] += 1

# produce the output in order of increasing scriptnum/launchnum
for scriptnum in sorted(scriptcounts.keys()):
    for launchnum in sorted(scriptcounts[scriptnum].keys()):
        print "script%s\nlaunch%s.VBS\n# count %d\n"%(scriptnum,launchnum,scriptcounts[scriptnum][launchnum])

输出(以您要求的格式)是,例如:

^{pr2}$

在关于芬德尔()返回所有匹配项的列表-每个匹配项都是模式中()部分的列表,除了(?)?mi)这是一个指令,用于告诉正则表达式匹配器跨行结束\n并匹配不区分大小写的匹配。regex模式(如fragment'script(\d)'将脚本/启动后的数字拉入匹配-这可以很容易地包括'script'(script\d)',类似的是'(launch\d+\.VBS'),只有打印需要修改才能处理这种变化。在

高温 巴尼

下面是一个使用嵌套字典的方法。如果您希望输出的格式不同,请告诉我:

#!/usr/bin/env python3

import re
script_dict={}
with open('infile.txt','r') as infile:
    scriptre = re.compile(r"^script\d+$")
    for line in infile:
        line = line.rstrip()
        if scriptre.match(line) is not None:
            script_dict[line] = {}

    infile.seek(0) # go to beginning
    launchre = re.compile(r"^launch\d+\.[vV][bB][sS]$")
    current=None
    for line in infile:
        line = line.rstrip()
        if line in script_dict:
            current=line
        elif launchre.match(line) is not None and current is not None:
            if line not in script_dict[current]:
                script_dict[current][line] = 1 
            else:
                script_dict[current][line] += 1

print(script_dict)

相关问题 更多 >