Python:如何编写带格式插件的通用文件读取器

1 投票
1 回答
1660 浏览
提问于 2025-04-16 14:45

我正在尝试编写一个通用的医疗图像格式读取器,以便处理我们遇到的各种图像格式。我想,既然要学习,就去模仿一下专业人士是怎么做的,于是我查看了PIL是如何通用地读取文件的(即“Python图像库”,格式部分)。

根据我的理解,PIL有一个打开文件的函数,它会循环检查一系列可能的接受函数。当找到一个合适的函数时,它就会使用相应的工厂函数来创建合适的对象。

于是我也开始尝试,以下是我简化后的代码:


pluginID = []     # list of all registered plugin IDs
OPEN = {}         # plugins have open and (maybe) accept functions as a tuple

_initialized = False

import os, sys

def moduleinit():
    '''Explicitly initializes the library.  This function 
    loads all available file format drivers.

    This routine has been lifted from PIL, the Python Image Library'''

    global _initialized
    global pluginID
    if _initialized:
        return 

    visited = {}

    directories = sys.path

    try:
        directories = directories + [os.path.dirname(__file__)]
    except NameError:
        pass

    # only check directories (including current, if present in the path)
    for directory in filter(isDirectory, directories):
        fullpath = os.path.abspath(directory)
        if visited.has_key(fullpath):
            continue
        for file in os.listdir(directory):
            if file[-19:] == "TestReaderPlugin.py":
                f, e = os.path.splitext(file)
                try:
                    sys.path.insert(0, directory)
                    try: # FIXME: this will not reload and hence pluginID 
                        # will be unpopulated leading to "cannot identify format"
                        __import__(f, globals(), locals(), [])
                    finally:
                        del sys.path[0]
                except ImportError:
                    print f, ":", sys.exc_value
        visited[fullpath] = None

    if OPEN:
        _initialized = True
        return 1

class Reader:
    '''Base class for image file format handlers.'''
    def __init__(self, fp=None, filename=None):

        self.filename = filename

        if isStringType(filename):
            import __builtin__
            self.fp = __builtin__.open(filename) # attempt opening

        # this may fail if not implemented
        self._open() # unimplemented in base class but provided by plugins

    def _open(self):
        raise NotImplementedError(
            "StubImageFile subclass must implement _open"
            )


# this is the generic open that tries to find the appropriate handler
def open(fp):
    '''Probe an image file

    Supposed to attempt all opening methods that are available. Each 
    of them is supposed to fail quickly if the filetype is invalid for its 
    respective format'''

    filename=fp

    moduleinit() # make sure we have access to all the plugins

    for i in pluginID:
        try:
            factory, accept = OPEN[i]
            if accept:
                fp = accept(fp)
                # accept is expected to either return None (if unsuccessful) 
                # or hand back a file handle to be used for opening                                 
                if fp:
                    fp.seek(0)  
                    return factory(fp, filename=filename) 
        except (SyntaxError, IndexError, TypeError): 
                pass # I suppose that factory is allowed to have these 
                # exceptions for problems that weren't caught with accept()
                # hence, they are simply ignored and we try the other plugins

    raise IOError("cannot identify format")

# --------------------------------------------------------------------
# Plugin registry

def register_open(id, factory, accept=None):
    pluginID.append(id)
    OPEN[id] = factory, accept

# --------------------------------------------------------------------
# Internal:

# type stuff
from types import  StringType

def isStringType(t):
    return isinstance(t, StringType)

def isDirectory(f):
    '''Checks if an object is a string, and that it points to a directory'''
    return isStringType(f) and os.path.isdir(f)

在后台,有一个重要的步骤是,在第一次尝试打开文件时,会注册所有格式的插件(moduleinit)。每个合适的插件必须在一个可访问的路径下,并且命名为*TestReaderPlugin.py。这个插件会被动态导入。每个插件模块都需要调用一个register_open函数,提供一个ID、一个创建文件的方法,以及一个测试候选文件的接受函数。

一个示例插件看起来会是这样的:


import TestReader

def _accept(filename):
    fp=open(filename,"r")
    # we made it here, so let's just accept this format
    return fp

class exampleTestReader(TestReader.Reader):
    format='example'

    def _open(self):
        self.data = self.fp.read()

TestReader.register_open('example', exampleTestReader, accept=_accept)

TestReader.open()是用户将会使用的函数:

import TestReader
a=TestReader.open(filename) # easy

那么,问题出在哪里呢?首先,我仍在寻找一种更符合Python风格的方法。这种方法靠谱吗?我怀疑的原因是moduleinit阶段的魔法看起来有点乱。这个部分是直接从PIL复制过来的。主要问题是:如果你重新加载(TestReader),一切都会停止工作,因为ID会被初始化为[],但插件不会被重新加载。

有没有更好的方法来设置一个通用的读取器,使其
1. 可以简单地通过open(filename)调用来处理所有格式,
2. 只需要提供封装良好的插件来支持你想要的任何格式,
3. 在重新加载时也能正常工作?

1 个回答

1

一些指导原则:

  1. 可以用“查看”缓冲区的概念来测试是否有你能理解的数据。
  2. 用户不想知道导入器的名字(如果你有100个导入器怎么办),可以使用一个“外观”接口,比如medicimage.open(filepath)
  3. 要实现重新加载功能,你需要实现一些逻辑,网上有很多例子可以参考。

撰写回答