Python - 检查文件中行的顺序

1 投票

4 回答

1392 浏览

提问于 2025-04-15 17:11

怎么检查一个文件中行的顺序呢？

下面是一个示例文件：

a b c d e f
b c d e f g
1 2 3 4 5 0

要求：

所有以字母a开头的行，必须在以字母b开头的行之前。
以字母a开头的行可以有很多行，没有限制。
以字母a开头的行可以有，也可以没有。
包含数字的行，必须在以字母b开头的行之后。
数字行必须至少有两个数字，并且最后要有一个零。
如果不符合这些条件，就要报错。

我最开始想用一个比较复杂的for循环来解决这个问题，但失败了，因为我无法访问除了第一行以外的其他行。而且，我也不知道怎么定义一行相对于其他行的位置。这些文件的长度没有限制，所以内存可能也会成为一个问题。

任何建议都非常欢迎！简单易懂的方案对我这个困惑的新手来说特别好！

谢谢，

Seafoid。

错误处理内存管理数据验证文件处理文本解析编程建议循环结构行顺序检查

4 个回答

关于行的限制：

I. 一旦我们遇到以 'b' 开头的行，就不能再有以 'a' 开头的行了。

II. 如果我们遇到了一行数字，那么之前的一行必须是以 'b' 开头的。（或者你的第四个条件可以有另一种理解：每一行以 'b' 开头的行后面必须跟着一行数字）。

关于数字行的限制（用正则表达式表示）： /\d+\s+\d+\s+0\s*$/

#!/usr/bin/env python
import re

is_numeric = lambda line: re.match(r'^\s*\d+(?:\s|\d)*$', line)
valid_numeric = lambda line: re.search(r'(?:\d+\s+){2}0\s*$', line)

def error(msg):
    raise SyntaxError('%s at %s:%s: "%s"' % (msg, filename, i+1, line))

seen_b, last_is_b = False, False
with open(filename) as f:
    for i, line in enumerate(f):
        if not seen_b:
           seen_b = line.startswith('b')

        if seen_b and line.startswith('a'):
           error('failed I.')
        if not last_is_b and is_numeric(line):
           error('failed II.')
        if is_numeric(line) and not valid_numeric(line):
           error('not a valid numeric line')

        last_is_b = line.startswith('b')

回答于 2025-04-15 由 Python大师

分享举报

你可以用 lines = open(thefile).readlines() 这行代码把所有的行读进一个列表里，然后就可以对这个列表进行操作了。虽然这种方法不是效率最高的，但它非常简单，正好符合你的要求。

再简单一点的方法是用多个循环，每个条件用一个循环（除了条件2，因为它不是一个可以被违反的条件，还有条件5，它其实也不算是个条件;-）。比如“所有以a开头的行，必须在以b开头的行之前”可以理解为“如果有以a开头的行，那么最后一行以a开头的行必须在第一行以b开头的行之前”，所以：

lastwitha = max((i for i, line in enumerate(lines)
                 if line.startswith('a')), -1)
firstwithb = next((i for i, line in enumerate(lines) 
                   if line.startswith('b')), len(lines))
if lastwitha > firstwithb: raise Error

接下来对于“包含整数的行”也是类似的：

firstwithint = next((i for i, line in enumerate(lines)
                     if any(c in line for c in '0123456789')), len(lines))
if firstwithint < firstwithb: raise Error

这些提示应该足够你完成作业的最后一部分了——你能自己解决条件4吗？

当然，你可以用不同的方法来实现我这里提到的（比如用 next 来获取满足条件的行中的第一个数字——这需要Python 2.6哦——还有 any 和 all 来检查序列中的任何或所有项目是否满足条件），但我想尽量让你理解得简单。如果你觉得传统的 for 循环比 next、any 和 all 更简单，告诉我们，我们可以教你如何把这些高级用法转换成更基础的概念！

回答于 2025-04-15 由 Python大师

分享举报

这是一个简单的循环方法。它定义了一个函数，用来判断线条的类型，类型从1到3。接着，我们会逐行检查文件中的内容。如果遇到一个未知的线条类型，或者这个线条类型比之前的任何一个都小，就会出现错误。

def linetype(line):
    if line.startswith("a"):
        return 1
    if line.startswith("b"):
        return 2
    try:
        parts = [int(x) for x in line.split()]
        if len(parts) >=3 and parts[-1] == 0:
            return 3
    except:
        pass
    raise Exception("Unknown Line Type")

maxtype = 0

for line in open("filename","r"):  #iterate over each line in the file
    line = line.strip() # strip any whitespace
    if line == "":      # if we're left with a blank line
        continue        # continue to the next iteration

    lt = linetype(line) # get the line type of the line
                        # or raise an exception if unknown type
    if lt >= maxtype:   # as long as our type is increasing
        maxtype = lt    # note the current type
    else:               # otherwise line type decreased
        raise Exception("Out of Order")  # so raise exception

print "Validates"  # if we made it here, we validated

回答于 2025-04-15 由 Python大师

分享举报

Python - 检查文件中行的顺序

4 个回答

撰写回答