如何使用正则表达式跳过文档字符串

2 投票

3 回答

1247 浏览

提问于 2025-04-11 09:19

我想在一个Python源文件中插入一些导入语句，但我希望把它们放在初始文档字符串之后。假设我像这样把文件加载到一个叫做lines的变量中：

lines = open('filename.py').readlines()

我该如何找到文档字符串结束的那一行的行号呢？

正则表达式文档字符串行号源文件

3 个回答

这是一个根据Brian的精彩回答写的函数，你可以用它来把一个文件分成文档字符串和代码：

def split_docstring_and_code(infile):

    import tokenize
    insert_index = None
    f = open(infile)
    for tok, text, (srow, scol), (erow,ecol), l in tokenize.generate_tokens(f.readline):
        if tok == tokenize.COMMENT:
            continue
        elif tok == tokenize.STRING:
            insert_index = erow, ecol
            break
        else:
            break # No docstring found

    lines = open(infile).readlines()
    if insert_index is not None:
        erow = insert_index[0]
        return "".join(lines[:erow]), "".join(lines[erow:])
    else:
        return "", "".join(lines)

这个函数假设，结束文档字符串的那一行后面不会有额外的代码，超出了字符串的结束标记。

回答于 2025-04-11 由 Python大师

分享举报

与其使用正则表达式，或者依赖特定的格式，你可以使用Python的tokenize模块。

import tokenize
f=open(filename)
insert_index = None
for tok, text, (srow, scol), (erow,ecol), l in tokenize.generate_tokens(f.readline):
    if tok == tokenize.COMMENT:
        continue
    elif tok == tokenize.STRING:
        insert_index = erow, ecol
        break
    else:
        break # No docstring found

这样你甚至可以处理一些特殊情况，比如：

# Comment
# """Not the real docstring"""
' this is the module\'s \
docstring, containing:\
""" and having code on the same line following it:'; this_is_code=42

就像Python自己处理这些情况一样。

回答于 2025-04-11 由 Python大师

分享举报

如果你在使用标准的文档字符串格式，你可以这样做：

count = 0
for line in lines:
    if line.startswith ('"""'):
        count += 1
        if count < 3:
            # Before or during end of the docstring
            continue
    # Line is after docstring

对于没有文档字符串的文件，可能需要做一些调整，但如果你的文件格式一致，那应该会比较简单。

回答于 2025-04-11 由 Python大师

分享举报

如何使用正则表达式跳过文档字符串

3 个回答

撰写回答