如何实现Python版的tail -F？

Question

用Python怎么优雅地监视一个不断增长的文件，看看里面有没有出现某些关键词呢？

在命令行中，我可能会这样说：

tail -f "$file" | grep "$string" | while read hit; do
    #stuff
done

Answer 1

编辑：正如下面的评论所提到的，O_NONBLOCK 对于磁盘上的文件是无效的。如果有人想要从套接字、命名管道或其他进程中获取数据，这个信息还是有帮助的，但并没有真正回答最初的问题。原始回答仍然保留在下面，以供后人参考。（调用 tail 和 grep 是可行的，但从某种意义上来说，这也算不上一个真正的答案。）

你可以选择用 O_NONBLOCK 打开文件，然后使用 select 来检查是否可以读取数据，再用 read 来读取新数据，最后用字符串方法来过滤文件末尾的行……或者你也可以直接使用 subprocess 模块，让 tail 和 grep 像在命令行中那样为你处理这些工作。

Answer 2

def tail(f):
    f.seek(0, 2)

    while True:
        line = f.readline()

        if not line:
            time.sleep(0.1)
            continue

        yield line

def process_matches(matchtext):
    while True:
        line = (yield)  
        if matchtext in line:
            do_something_useful() # email alert, etc.


list_of_matches = ['ERROR', 'CRITICAL']
matches = [process_matches(string_match) for string_match in list_of_matches]    

for m in matches: # prime matches
    m.next()

while True:
    auditlog = tail( open(log_file_to_monitor) )
    for line in auditlog:
        for m in matches:
            m.send(line)

我用这个来监控日志文件。在完整的实现中，我把list_of_matches放在一个配置文件里，这样可以用于多个目的。在我计划的改进中，有一个是支持正则表达式，而不是简单的'in'匹配。

Answer 3

最简单的方法就是不断地从文件中读取内容，查看有什么新东西，然后测试一下有没有匹配的。

import time

def watch(fn, words):
    fp = open(fn, 'r')
    while True:
        new = fp.readline()
        # Once all lines are read this just returns ''
        # until the file changes and a new line appears

        if new:
            for word in words:
                if word in new:
                    yield (word, new)
        else:
            time.sleep(0.5)

fn = 'test.py'
words = ['word']
for hit_word, hit_sentence in watch(fn, words):
    print "Found %r in line: %r" % (hit_word, hit_sentence)

这个用 readline 的方法适用于你知道数据会按行出现的情况。

如果数据是某种流式的东西，你就需要一个缓冲区，这个缓冲区要比你要找的最大 word 还要大，先把它填满。这样就稍微复杂一些了……

如何实现Python版的tail -F？

10 个回答

撰写回答