对处理文件的简单脚本进行过度日志记录和异常处理,正常吗?

6 投票
5 回答
1000 浏览
提问于 2025-04-16 02:18

我发现自己经常用Python来写一些文件管理的脚本,就像下面这个例子一样。在网上找相关的例子时,我很惊讶地发现,很多例子都几乎没有记录日志和处理异常的部分。每次我写新的脚本时,我的初衷并不是写成下面这样,但只要涉及到文件,我的紧张感就会占据上风,最后的结果和我在网上看到的例子完全不一样。作为一个新手,我想知道这是不是正常的。如果不是,那大家是怎么处理这些未知情况和删除重要信息的恐惧的呢?

def flatten_dir(dirname):
    '''Flattens a given root directory by moving all files from its sub-directories and nested 
    sub-directories into the root directory and then deletes all sub-directories and nested 
    sub-directories. Creates a backup directory preserving the original structure of the root
    directory and restores this in case of errors.
    '''
    RESTORE_BACKUP = False
    log.info('processing directory "%s"' % dirname)
    backup_dirname = str(uuid.uuid4())
    try:
        shutil.copytree(dirname, backup_dirname)
        log.debug('directory "%s" backed up as directory "%s"' % (dirname,backup_dirname))
    except shutil.Error:
        log.error('shutil.Error: Error while trying to back up the directory')
        sys.stderr.write('the program is terminating with an error\n')
        sys.stderr.write('press consult the log file\n')
        sys.stderr.flush()
        time.sleep(0.25)
        print 'Press any key to quit this program.'
        msvcrt.getch()
        sys.exit()

    for root, dirs, files in os.walk(dirname, topdown=False):
        log.debug('os.walk passing: (%s, %s, %s)' % (root, dirs, files))
        if root != dirname:
            for file in files:
                full_filename = os.path.join(root, file)
                try:
                    shutil.move(full_filename, dirname)
                    log.debug('"%s" copied to directory "%s"' % (file,dirname))
                except shutil.Error:
                    RESTORE_BACKUP = True
                    log.error('file "%s" could not be copied to directory "%s"' % (file,dirname))
                    log.error('flagging directory "%s" for reset' % dirname)
            if not RESTORE_BACKUP:
                try:
                    shutil.rmtree(root)
                    log.debug('directory "%s" deleted' % root)
                except shutil.Error:
                    RESTORE_BACKUP = True
                    log.error('directory "%s" could not be deleted' % root)
                    log.error('flagging directory "%s" for reset' % dirname)
        if RESTORE_BACKUP:
            break
    if RESTORE_BACKUP:
        RESTORE_FAIL = False
        try:
            shutil.rmtree(dirname)
        except shutil.Error:
            log.error('modified directory "%s" could not be deleted' % dirname)
            log.error('manual restoration from backup directory "%s" necessary' % backup_dirname)
            RESTORE_FAIL = True 
        if not RESTORE_FAIL:
            try:
                os.renames(backup_dirname, dirname)
                log.debug('back up of directory "%s" restored' % dirname)
                print '>'
                print '>******WARNING******'
                print '>There was an error while trying to flatten directory "%s"' % dirname
                print '>back up of directory "%s" restored' % dirname
                print '>******WARNING******'
                print '>'
            except WindowsError:
                log.error('backup directory "%s" could not be renamed to original directory name' % backup_dirname)
                log.error('manual renaming of backup directory "%s" to original directory name "%s" necessary' % (backup_dirname,dirname))
                print '>'
                print '>******WARNING******'
                print '>There was an error while trying to flatten directory "%s"' % dirname
                print '>back up of directory "%s" was NOT restored successfully' % dirname
                print '>no information is lost'
                print '>check the log file for information on manually restoring the directory'
                print '>******WARNING******'
                print '>'
    else:
        try:
            shutil.rmtree(backup_dirname)
            log.debug('back up of directory "%s" deleted' % dirname)
            log.info('directory "%s" successfully processed' % dirname)
            print '>directory "%s" successfully processed' % dirname
        except shutil.Error:
            log.error('backup directory "%s" could not be deleted' % backup_dirname)
            log.error('manual deletion of backup directory "%s" necessary' % backup_dirname)
            print '>'
            print '>******WARNING******'
            print '>directory "%s" successfully processed' % dirname
            print '>cleanup of backup directory "%s" failed' % backup_dirname
            print '>manual cleanup necessary'
            print '>******WARNING******'
            print '>'

5 个回答

2

过度担心会让你的代码变得难以理解。这是个很糟糕的情况,原因有很多。它会隐藏错误,让你在需要修改程序时变得更加困难,也会让调试变得复杂。

假设Amoss无法帮你摆脱这种过度担心,下面是我可能会重写这个程序的方式。请注意:

  • 每一段包含大量担心的代码都被拆分成了独立的函数。

  • 每次捕获到异常时,都会将其重新抛出,直到最终在main函数中被捕获。这消除了像RESTORE_BACKUPRESTORE_FAIL这样的变量的需要。

  • 程序的核心部分(在flatten_dir中)现在只有17行,而且没有多余的担心。


def backup_tree(dirname, backup_dirname):
    try:
        shutil.copytree(dirname, backup_dirname)
        log.debug('directory "%s" backed up as directory "%s"' % (dirname,backup_dirname))
    except:
        log.error('Error trying to back up the directory')
        raise

def move_file(full_filename, dirname):
    try:
        shutil.move(full_filename, dirname)
        log.debug('"%s" copied to directory "%s"' % (file,dirname))
    except:
        log.error('file "%s" could not be moved to directory "%s"' % (file,dirname))
        raise

def remove_empty_dir(dirname):
    try:
        os.rmdir(dirname)
        log.debug('directory "%s" deleted' % dirname)
    except:
        log.error('directory "%s" could not be deleted' % dirname)
        raise

def remove_tree_for_restore(dirname):
    try:
        shutil.rmtree(dirname)
    except:
        log.error('modified directory "%s" could not be deleted' % dirname)
        log.error('manual restoration from backup directory "%s" necessary' % backup_dirname)
        raise

def restore_backup(backup_dirname, dirname):
    try:
        os.renames(backup_dirname, dirname)
        log.debug('back up of directory "%s" restored' % dirname)
        print '>'
        print '>******WARNING******'
        print '>There was an error while trying to flatten directory "%s"' % dirname
        print '>back up of directory "%s" restored' % dirname
        print '>******WARNING******'
        print '>'
    except:
        log.error('backup directory "%s" could not be renamed to original directory name' % backup_dirname)
        log.error('manual renaming of backup directory "%s" to original directory name "%s" necessary' % (backup_dirname,dirname))
        print '>'
        print '>******WARNING******'
        print '>There was an error while trying to flatten directory "%s"' % dirname
        print '>back up of directory "%s" was NOT restored successfully' % dirname
        print '>no information is lost'
        print '>check the log file for information on manually restoring the directory'
        print '>******WARNING******'
        print '>'
        raise

def remove_backup_tree(backup_dirname):
    try:
        shutil.rmtree(backup_dirname)
        log.debug('back up of directory "%s" deleted' % dirname)
        log.info('directory "%s" successfully processed' % dirname)
        print '>directory "%s" successfully processed' % dirname
    except shutil.Error:
        log.error('backup directory "%s" could not be deleted' % backup_dirname)
        log.error('manual deletion of backup directory "%s" necessary' % backup_dirname)
        print '>'
        print '>******WARNING******'
        print '>directory "%s" successfully processed' % dirname
        print '>cleanup of backup directory "%s" failed' % backup_dirname
        print '>manual cleanup necessary'
        print '>******WARNING******'
        print '>'
        raise

def flatten_dir(dirname):
    '''Flattens a given root directory by moving all files from its sub-directories and nested 
    sub-directories into the root directory and then deletes all sub-directories and nested 
    sub-directories. Creates a backup directory preserving the original structure of the root
    directory and restores this in case of errors.
    '''
    log.info('processing directory "%s"' % dirname)
    backup_dirname = str(uuid.uuid4())
    backup_tree(dirname, backup_dirname)
    try:
        for root, dirs, files in os.walk(dirname, topdown=False):
            log.debug('os.walk passing: (%s, %s, %s)' % (root, dirs, files))
            if root != dirname:
                for file in files:
                    full_filename = os.path.join(root, file)
                    move_file(full_filename, dirname)
                remove_empty_dir(dirname)
    except:
        remove_tree_for_restore(dirname)
        restore_backup(backup_dirname, dirname)
        raise
    else:
        remove_backup_tree(backup_dirname)

def main(dirname):
    try:
        flatten_dir(dirname)
    except:
        import exceptions
        logging.exception('error flattening directory "%s"' % dirname)
        exceptions.print_exc()
        sys.stderr.write('the program is terminating with an error\n')
        sys.stderr.write('press consult the log file\n')
        sys.stderr.flush()
        time.sleep(0.25)
        print 'Press any key to quit this program.'
        msvcrt.getch()
        sys.exit()
3

有点小紧张是可以的,但紧张也有不同的类型哦。比如在开发阶段,我会使用很多调试语句,这样我就能看到自己哪里出错了(如果真的出错的话)。有时候我会把这些语句留着,但用一个标志来控制它们是否显示(这就像一个调试标志)。你也可以设置一个“详细程度”标志,来控制你记录日志的多少。

另一种紧张是关于合理性检查的。这种紧张主要出现在你依赖外部数据或工具的时候——也就是那些不是你程序内部生成的东西。在这种情况下,保持警惕是没坏处的(尤其是对于你收到的数据——永远不要完全相信它)。

如果你在检查某个操作是否成功完成,保持警惕也是可以的。这是正常的错误处理的一部分。我注意到你在执行一些像删除目录和文件这样的操作。这些操作可能会失败,所以你必须考虑到它们失败的情况。如果你只是忽略这一点,你的代码可能会进入一个不确定或未定义的状态,可能会导致一些糟糕的(或者至少是不太理想的)结果。

至于日志文件和调试文件,如果你愿意,可以把它们留着。我通常会记录相当多的信息;足够让我知道发生了什么。当然,这个量是因人而异的。关键是要确保你不会被日志淹没;也就是说,信息太多了,以至于你无法轻松找到需要的内容。一般来说,记录日志可以帮助你弄清楚当你写的脚本突然停止工作时出了什么问题。与其逐步检查程序,不如通过查看日志大致了解问题出在哪里。

8

学会放手(或者说我如何学会与问题共存)...

问问自己:你到底在怕什么?如果真的发生了,你打算怎么处理?在你给出的例子中,你想避免数据丢失。你处理这个问题的方法是寻找你认为可能出错的各种情况,并在这些地方加上大量的日志记录。但事情还是可能出错,而且不清楚这么多日志记录是否真能解决问题。我们先来看看你想要实现的目标:

for each file in a tree
  if file is below the root
    move it into the root
if nothing went wrong
  delete empty subtrees

那么在这个过程中可能会出现什么问题呢?其实,文件移动操作可能会因为底层文件系统的问题而失败。有很多种情况会导致出错,我们能把它们都列出来并提供解决办法吗?不行……但总的来说,你会用同样的方法来处理这些问题。有时候,错误就是错误,无论它是什么。

所以在这种情况下,如果发生任何错误,你就想要中止操作并撤销任何更改。你决定的做法是创建一个备份副本,当出现问题时恢复它。但最可能出现的错误是文件系统已满,这样的话这些步骤可能会失败……好吧,这确实是一个常见的问题。如果你担心在任何时候会出现未知错误,怎么才能防止恢复过程出错呢?

一般来说,确保你先完成任何中间工作,然后再进行一个麻烦的(希望是原子性的)步骤。在你的情况下,你需要反转恢复的过程。与其先建立一个备份副本,不如先建立结果的副本。如果一切顺利,你就可以把新的结果替换掉旧的原始文件。如果你真的很谨慎,可以把这一步留给人来做。这样做的好处是,如果出现问题,你可以直接中止操作,丢掉你已经构建的部分状态。

这样你的结构就变成了:

make empty result directory
for every file in the tree
  copy file into new result
on failure abort otherwise
  move result over old source directory

顺便提一下,你当前脚本中有一个bug,这段伪代码让这个问题更明显:如果你在不同的分支中有同名文件,它们会在新的扁平化版本中互相覆盖。

关于这段伪代码的第二点是,所有的错误处理都放在同一个地方(也就是说,把创建新目录和递归复制放在一个try块里,然后在后面捕获所有错误),这样可以解决你最初提到的日志记录和错误检查与实际工作代码之间的比例过大的问题。

backup_dirname = str(uuid.uuid4())
try:
    shutil.mkdir(backup_dirname)
    for root, dirs, files in os.walk(dirname, topdown=False):
        for file in files:
            full_filename = os.path.join(root, file)
            target_filename = os.path.join(backup_dirname,file)
            shutil.copy(full_filename, target_filename)
catch Exception, e:
    print >>sys.stderr, "Something went wrong %s" % e
    exit(-1)
shutil.move(back_dirname,root)      # I would do this bit by hand really

撰写回答