如何使用awk替换所有组合中的不同文本块？

1条回答

网友

1楼 · 发布于 2024-05-16 03:21:40

下面是一个python脚本，用于读取cobol输入文件并打印出已定义和重新定义的变量的所有可能组合：

#!/usr/bin/python
"""Read cobol file and print all possible redefines."""
import sys
from itertools import product

def readfile(fname):
    """Read cobol file & return a master list of lines and namecount of redefined lines."""
    master = []
    namecount = {}
    with open(fname) as f:
        for line in f:
            line = line.rstrip(' .\t\n')
            if not line:
                continue
            words = line.split()
            n = int(words[0])
            if '=' in words or 'REDEFINES' in words:
                name = words[3]
            else:
                name = words[1]
            master.append((n, name, line))
            namecount[name] = namecount.get(name, 0) + 1
    # py2.7: namecount = {key: val for key, val in namecount.items() if val > 1}
    namecount = dict((key, val) for key, val in namecount.items() if val > 1)

    return master, namecount

def compute(master, skip=None):
    """Return new cobol file given master and skip parameters."""
    if skip is None:
        skip = {}
    seen = {}
    skip_to = None
    output = ''
    for n, name, line in master:
        if skip_to and n > skip_to:
            continue
        seen[name] = seen.get(name, 0) + 1
        if seen[name] != skip.get(name, 1):
            skip_to = n
            continue
        skip_to = None
        output += line + '\n' 
    return output

def find_all(master, namecount):
    """Return list of all possible output files given master and namecount."""
    keys = namecount.keys()
    values = [namecount[k] for k in keys]
    out = []
    for combo in product(*[range(1, v + 1) for v in values]):
        skip = dict(zip(keys, combo))
        new = compute(master, skip=skip)
        if new not in out:
            out.append(new)
    return out

def main(argv):
    """Process command line arguments and print results."""
    fname = argv[-1]
    master, namecount = readfile(fname)
    out = find_all(master, namecount)
    print('\n'.join(out))

if __name__ == '__main__':
    main(sys.argv)

如果上面的脚本保存在一个名为cobol.py的文件中，那么If可以作为：

^{pr2}$

定义和重定义的各种可能组合将显示在stdout上。在

这个脚本在python2（2.6+）或python3下运行。在

说明

代码使用三个函数：

readfile读取输入文件并返回两个变量，它们总结了其中内容的结构。
compute接受两个参数，并从中计算输出块。
find_all确定所有可能的输出块，使用compute创建它们，然后以列表形式返回它们。

让我们更详细地了解每个函数：

readfile

readfile以输入文件名作为参数，并返回一个列表master和一个字典namecount。对于输入文件中的每一个非空行，列表master都有一个元组，它包含（1）级别号，（2）定义或重新定义的名称，以及（2）原始行本身。对于示例输入文件，readfile返回master的值：

[(1, 'hello', '01 hello'),
 (2, 'stack', '    02 stack'),
 (2, 'overflow', '    02 overflow'),
 (4, 'hi', '        04 hi'),
 (2, 'overflow', '    02 friends = overflow'),
 (3, 'this', '        03 this'),
 (3, 'is', '        03 is'),
 (3, 'is', '        03 my = is'),
 (3, 'life', '        03 life'),
 (2, 'lol', '    02 lol'),
 (2, 'im', '    02 im'),
 (2, 'im', '    02 joking = im'),
 (3, 'filler', '        03 filler')]

readfile还返回字典namecount，它为每个重新定义的名称提供一个条目，并统计该名称有多少个定义/重定义。对于示例输入文件，namecount的值为：

{'im': 2, 'is': 2, 'overflow': 2}

这表示im、is和{}各有两个可能的值。在

readfile当然是为了处理当前版本的问题中的输入文件格式而设计的。在可能的范围内，它也被设计成与这个问题以前版本的格式一起工作。例如，无论变量重新定义是用等号（当前版本）还是像以前的版本一样使用单词REFDEFINES来表示，都可以接受。这是为了使这个脚本尽可能灵活。在

compute

函数compute生成每个输出块。它使用两个参数。第一个是master，它直接来自readfile。第二个是skip，它是从readfile返回的namecount字典派生的。例如，namecount字典说，im有两个可能的定义。这显示了如何使用compute为以下各项生成输出块：

In [14]: print compute(master, skip={'im':1, 'is':1, 'overflow':1})
01 hello
    02 stack
    02 overflow
        04 hi
    02 lol
    02 im

In [15]: print compute(master, skip={'im':2, 'is':1, 'overflow':1})
01 hello
    02 stack
    02 overflow
        04 hi
    02 lol
    02 joking = im
        03 filler

请注意，上面对compute的第一次调用生成了使用im的第一个定义的块，第二个调用生成了使用第二个定义的块。在

find_all

有了以上两个功能，很明显最后一步就是生成所有不同的定义组合并打印出来。这就是函数find_all的作用。使用master和{}，它系统地运行所有可用的定义和调用组合compute，为每个定义和调用创建一个块。它收集所有可以通过这种方式创建的独特块并返回它们。在

find_all返回的输出是字符串列表。每个字符串都是对应于一个定义/重定义组合的块。使用问题的示例输入，显示find_all返回的内容：

In [16]: find_all(master, namecount)
Out[16]: 
['01 hello\n    02 stack\n    02 overflow\n        04 hi\n    02 lol\n    02 im\n',
 '01 hello\n    02 stack\n    02 friends = overflow\n        03 this\n        03 is\n        03 life\n    02 lol\n    02 im\n',
 '01 hello\n    02 stack\n    02 overflow\n        04 hi\n    02 lol\n    02 joking = im\n        03 filler\n',
 '01 hello\n    02 stack\n    02 friends = overflow\n        03 this\n        03 is\n        03 life\n    02 lol\n    02 joking = im\n        03 filler\n',
 '01 hello\n    02 stack\n    02 friends = overflow\n        03 this\n        03 my = is\n        03 life\n    02 lol\n    02 im\n',
 '01 hello\n    02 stack\n    02 friends = overflow\n        03 this\n        03 my = is\n        03 life\n    02 lol\n    02 joking = im\n        03 filler\n']

作为一个例子，让我们以find_all返回的第四个字符串为例，为了获得更好的格式，我们将print它：

In [18]: print find_all(master, namecount)[3]
01 hello
    02 stack
    02 friends = overflow
        03 this
        03 is
        03 life
    02 lol
    02 joking = im
        03 filler

在完整的脚本中，find_all的输出组合在一起并打印到stdout，如下所示：

out = find_all(master, namecount)              
print('\n'.join(out))

这样，输出将显示所有可能的块。在

早期版本问题的答案

原始问题的答案

awk 'f==0 && !/REDEFINES/{s=s"\n"$0;next} /REDEFINES/{f=1;print s t>("output" ++c ".txt");t=""} {t=t"\n"$0} END{print s t>("output" ++c ".txt")}' input

说明：

这个节目具有以下变量：

f是一个标志，在第一个重定义之前为零，之后为一。
s包含第一个重定义之前的所有文本。
t包含当前重定义的文本。
c是一个计数器，用于确定输出名称的名称。

代码的工作原理如下：

f==0 && !/REDEFINES/{s=s"\n"$0;next}
在遇到第一个重定义之前，文本保存在变量s中，我们跳过其余的命令并跳转到next行。
/REDEFINES/{f=1;print s t>("output" ++c ".txt");t=""}
每次遇到重定义行时，我们将标志f设置为1，并将prolog部分s和当前的重定义部分一起打印到名为outputn.txt的文件中，其中n由计数器c的值代替。在
因为我们在一个新的重定义部分的开始，变量t被设置为空。
{t=t"\n"$0}
将此重定义的当前行保存到变量t。
END{print s t>("output" ++c ".txt")}
将打印最后一个重定义节的输出文件。

轻微改善

上面代码生成的每个输出文件都有一个前导空白行。{{{cd60>通过下面的函数删除代码：

awk '/REDEFINES/{f=1;print substr(s,2) t>("output" ++c ".txt");t=""} f==0 {s=s"\n"$0;next} {t=t"\n"$0} END{print substr(s,2) t>("output" ++c ".txt")}' input

对于多样性，这个版本的逻辑稍有不同，但在其他方面，会达到相同的结果。在

修改后问题的答案

awk 'f==1 && pre==$1 && !/REDEFINES/{tail=tail "\n" $0} /REDEFINES/{pre=$1;f=1;t[++c]="\n"$0} f==0 {head=head"\n"$0;next} pre!=$1{t[c]=t[c]"\n"$0} END{for (i=0;i<=c;i++) {print head t[i] tail>("output" (i+1) ".txt")}}' file

说明

早期版本问题的答案

原始问题的答案

说明：

轻微改善

修改后问题的答案

相关问题更多 >

编程相关推荐

热门问题

热门文章