如何编写Python脚本从格式化驱动器恢复文件?

1 投票
3 回答
15160 浏览
提问于 2025-04-18 04:23

我想写一个Python脚本,用来从格式化的硬盘中恢复文件。我知道格式化并不会真的删除硬盘上的数据,而只是把那些空间标记为可以被覆盖。那么,我该怎么恢复那些还没有被覆盖的文件呢?

3 个回答

1

你可以使用sleuthkit来实现这个功能:

import argparse
import subprocess
import re
import os

TYPECODES = ['\-', 'r', 'd', 'b', 'l', 'p', 's', 'w', 'v']
DESCRIPTIONS = [
    'unknown type',
    'regular file',
    'deleted file',
    'block device',
    'symbolic link',
    'named FIFO',
    'shadow file',
    'whiteout file',
    'TSK virtual file',
]
TYPEDICT = dict(zip((tt.strip('\\') for tt in TYPECODES),  DESCRIPTIONS))

parser = argparse.ArgumentParser(
    description='Recover files from a disk image using SleuthKit',
)
parser.add_argument(
    'image', type=str, nargs=1, help='path to disk image or mount point',
)
parser.add_argument(
    '-o', '--output', type=str, nargs='?', dest='output', default='recovered',
    help=('output extracted files to this directory [default=./recovered/]'),
)
parser.add_argument(
     '-v', '--verbose', dest='verbose', action='store_true',
    default=False, help=('print progress message'),
)


def recover(imgpath, outpath, verbose=False):

    # check that we can open image
    try:
        with open(imgpath, 'r'):
            pass
    except IOError:
        print('Unable to open %s. Check that the path is '
              'correct, and that you have read permission.' % imgpath)
        return

    # if the output directory exists, check that it's writeable
    if os.path.isdir(outpath):
        if not os.access(outpath, os.W_OK):
            print('Output directory %s is not writeable - check permissions'
                  % outpath)
            return
    # otherwise create it
    else:
        try:
            os.makedirs(outpath)
        except IOError:
            print('Could not create output directory %s - check permissions'
                  % outpath)
            return

    cmd = ['fls', '-i', 'raw', '-p', '-r', imgpath]

    p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    out, err = p.communicate()

    if p.returncode:
        print('Command "%s" failed:\n%s' % (' '.join(cmd), err))
        return

    ft = ''.join(TYPECODES)
    regex = '([%s])/([%s])\s+\*\s+(\d+):\s+(.*)' % (ft, ft)
    success = {}
    failure = {}
    skipped = {}

    for ftype, mtype, inode, relpath in re.findall(regex, out):

        recpath = os.path.join(outpath, relpath)
        recdir, recname = os.path.split(recpath)
        item = {relpath:[imgpath, relpath]}

        # don't try to recover directories
        if os.path.isdir(recpath):
            continue

        # only worth recovering deleted files
        elif (ftype in ('r', 'd')) and (mtype in ('r', 'd')):
            if not os.path.isdir(recdir):
                if os.path.exists(recdir):
                    os.remove(recdir)
                os.makedirs(recdir)
            cmd = ['icat', '-i', 'raw', '-r', imgpath, inode]
            with open(recpath, 'wb', 4096) as outfile:
                err = subprocess.call(cmd, stdout=outfile, bufsize=4096)
            if err:
                msg = '[FAILED]'
                failure.update(item)
            else:
                msg = '[RECOVERED]'
                success.update(item)
            if verbose:
                if ftype != mtype:
                    realloc_msg = (
                        '[WARNING: file name structure (%s) '
                        'does not match metadata (%s)]'
                        % (TYPEDICT[ftype], TYPEDICT[mtype]))
                else:
                    realloc_msg = ''
                print('%s %s:%s --> %s %s'
                       % (msg, imgpath, inode, recpath, realloc_msg))
        else:
            # skip unknown/other file types
            if verbose:
                print('[SKIPPED] %s:%s [%s / %s]'
                       % (imgpath, inode, TYPEDICT[ftype], TYPEDICT[mtype]))
            skipped.update(item)

    print('-' * 50)
    nsuccesses = len(success)
    nfailures = len(failure)
    nskipped = len(skipped)
    print('%i files successfully recovered to %s'
          % (len(success), outpath))
    print('%i files skipped' % nskipped)
    print('%i files could not be successfully recovered' % nfailures)
    if nfailures:
        print('\n'.join([(' * ' + pth) for pth in failure.keys()]))
    print('-' * 50)

if __name__ == '__main__':
    args = parser.parse_args()
    imgpath = args.image[0]
    outpath = args.output
    recover(imgpath, outpath, verbose=args.verbose)

只需将脚本复制粘贴到你的Pycharm里就可以了。

1

也许这个问题不是关于Python脚本的,而是关于文件恢复的。如果是这样的话,你需要的策略会根据你使用的硬盘格式和操作系统而有所不同。

你可以尝试在不使用Python的情况下恢复文件,实际上是利用文件系统和操作系统的一些特性来恢复被删除的文件。

1

这样的脚本可能不太好使,因为Python的函数和C语言的库是为了在完整的文件系统上工作而设计的。如果你想恢复数据,真正需要做的是直接从磁盘读取。所以,也许你应该把问题换成关于这个的。

在StackOverflow上与Python相关的问题:

撰写回答