Python - 正则表达式 - 修改文本文件

0 投票

1 回答

695 浏览

数据工程师

提问于 2025-04-15 15:23

我是Python的新手……希望能得到一些帮助，完成以下任务:-)

我有一棵包含各种文件的树，其中一些是C语言的源代码。

我想用Python脚本来修改这些C文件。

这些C代码中有4个定义 -

#define ZR_LOG0(Id, Class, Seveity, Format)
#define ZR_LOG1(Id, Class, Seveity, Format, Attr0)
#define ZR_LOG2(Id, Class, Seveity, Format, Attr0, Attr1)
#define ZR_LOG3(Id, Class, Seveity, Format, Attr0, Attr1, Attr2)

在C源代码中，分散着各种ZR_LOGn的行。

举个例子： ZR_LOG1 (1, LOG_CLASS_3, LOG_INFO, "hello world %d", 76);

空格（空格、制表符）可能出现在行的任何地方（字段之间）。

Python脚本的任务如下：

将任何'Id'字段（这是一个整数类型，我们不关心它的原始值）替换为顺序计数器。（我们遇到的第一个'LOG'...行中的'Id'字段将被赋值为0，下一个为1，依此类推）
在一个单独的输出文件中，对于每一行ZR_LOG，我们将创建一个索引行，格式为 { NewId, Format }, 对于上面的例子，将得到：
```
{ 0, "hello world %d" },
```

非常感谢你的帮助……

我已经开始写了以下代码，你可以看看，或者完全忽略它。

'''
Created on Oct 25, 2009

@author: Uri Shkolnik

The following version does find & replace LOG Ids for all 
C source files in a dir (and below) with sequential counter, 
The files are assumed to be UTF-8 encoded. 
(which works fine if they are ASCII, because ASCII is a 
subset of UTF-8)
It also assemble new index file, composed from all new IDs and format fields

'''

import os, sys, re, shutil

mydir= '/home/uri/proj1'
searched_pattern0 = 'ZR_LOG0'

def search_and_replace(filepath):
    ''' replaces all string by a regex substitution '''
    backupName=filepath+'~re~'

    print 'reading:', filepath
    input = open(filepath,'rb')
    s=unicode(input.read(),'utf-8')
    input.close()

    m = re.match(ur'''[:space:]ZR_LOG[0-3].*\(.*[0-9]{0,10},LOG_''', s)
    print m

def c_files_search(dummy, dirr, filess):
    ''' search directories for file pattern *.c '''
    for child in filess:
        if '.c' == os.path.splitext(child)[1] and os.path.isfile(dirr+'/'+child):
            filepath = dirr+'/'+child
            search_and_replace(filepath)

os.path.walk(mydir, c_files_search, 3)

正则表达式文件操作文本处理 c语言自动化脚本源代码数据替换索引生成

1 个回答

几点说明：

你可以用'\s'来匹配空白字符。
正则表达式中的“捕获组”在这里很有用。

所以，我会这样做：

output = ''
counter = 1
for line in lines:
    # Match only ZR_LOG lines and capture everything surrounding "Id"
    match = re.match('^(.*\sZR_LOG[0-3]\s*\(\s*)'  # group(1), before Id
                     'Id'
                     '(,.*)$',  # group(2), after Id
                     line)
    if match:
        # Add everything before Id, the counter value and everything after Id
        output += match.group(1) + str(counter) + match.group(2)
        counter += 1
        # And do extra logging etc.
    else:
        output += line

回答于 2025-04-15 由 Python大师

分享举报

Python - 正则表达式 - 修改文本文件

1 个回答

撰写回答