str.startswith() 的行为和我预期的不符

1 投票

3 回答

5982 浏览

提问于 2025-04-15 11:55

我正在尝试检查一个文件中的制表符（/t）或空格字符，但我不明白为什么这段代码不起作用。我正在做的是读取一个文件，计算文件的行数，然后记录文件中每个函数的名称以及它们各自的代码行数。下面这段代码是我尝试计算函数行数的地方。

import re

...
    else:
            loc += 1
            for line in infile:
                line_t = line.lstrip()
                if len(line_t) > 0 \
                and not line_t.startswith('#') \
                and not line_t.startswith('"""'):
                    if not line.startswith('\s'):
                        print ('line = ' + repr(line))
                        loc += 1
                        return (loc, name)
                    else:
                        loc += 1
                elif line_t.startswith('"""'):
                    while True:
                        if line_t.rstrip().endswith('"""'):
                            break
                        line_t = infile.readline().rstrip()

            return(loc,name)

输出结果：

Enter the file name: test.txt
line = '\tloc = 0\n'

There were 19 lines of code in "test.txt"

Function names:

    count_loc -- 2 lines of code

正如你所看到的，我测试打印的那一行显示了一个制表符，但我的if语句明确表示（我以为是这样）只有在没有空白字符的情况下才会执行。

这是我一直在使用的完整测试文件：

def count_loc(infile):
    """ Receives a file and then returns the amount
        of actual lines of code by not counting commented
        or blank lines """

    loc = 0
    for line in infile:
        line = line.strip()
        if len(line) > 0 \
        and not line.startswith('//') \
        and not line.startswith('/*'):
            loc += 1
            func_loc, func_name = checkForFunction(line);
        elif line.startswith('/*'):
            while True:
                if line.endswith('*/'):
                    break
                line = infile.readline().rstrip()

    return loc

 if __name__ == "__main__":
    print ("Hi")
    Function LOC = 15
    File LOC = 19

字符串处理文件读取制表符空白字符行数计算函数名称代码行数

3 个回答

你可能对字符串字面量的理解有些误区。你可以这样来表示一个空格或者制表符（TAB）：

space = ' '
tab = '\t'

回答于 2025-04-15 由 Python大师

分享举报

你的问题已经有人回答过了，这个话题有点偏离，但我还是想说说...

如果你想分析代码，使用一个解析器通常会更简单，也不容易出错。如果你的代码是Python写的，Python自带了一些解析器，比如tokenize、ast和parser。如果是其他语言，你可以在网上找到很多解析器。ANTRL就是一个很有名的解析器，它也支持Python的绑定。

举个例子，下面这几行代码可以打印出一个Python模块中所有不是注释和文档字符串的行：

import tokenize

ignored_tokens = [tokenize.NEWLINE,tokenize.COMMENT,tokenize.N_TOKENS
                 ,tokenize.STRING,tokenize.ENDMARKER,tokenize.INDENT
                 ,tokenize.DEDENT,tokenize.NL]
with open('test.py', 'r') as f:
    g = tokenize.generate_tokens(f.readline)
    line_num = 0
    for a_token in g:
        if a_token[2][0] != line_num and a_token[0] not in ignored_tokens:
            line_num = a_token[2][0]
            print(a_token)

因为上面的a_token已经被解析过了，所以你也可以很容易地检查函数定义。同时，你还可以通过查看当前列的起始位置a_token[2][1]来跟踪函数的结束位置。如果你想做更复杂的事情，建议使用ast。

回答于 2025-04-15 由 Python大师

分享举报

\s 在使用 re 模块进行模式匹配时，表示空白字符。

但是对于 startswith 这个普通字符串的方法来说，\s 就没什么特别的了。它不是一个模式，只是普通的字符。

回答于 2025-04-15 由 Python大师

分享举报

str.startswith() 的行为和我预期的不符

3 个回答

撰写回答