从dicom和nifti文件中提取元数据
data-tracking的Python项目详细描述
磁共振元数据提取器
这是一个Python库,提供扫描文件夹、提取 来自文件(DICOM、NIFTI等)的元数据,并将其存储在数据库中。
安装
运行pip install data-tracking(仅用Python3测试)
使用
导入所需的函数,如下所示: from data_tracking.files_recording import create_provenance, visit。
使用以下项创建来源实体:
create_provenance(dataset, software_versions, db_url) Create (or get if already exists) a provenance entity, store it in the database and get back a provenance ID. * param dataset: Name of the data set. * param software_versions: (optional) Version of the software components used to get the data. It is a dictionary that accepts the following fields: - matlab_version - spm_version - spm_revision - fn_called - fn_version - others * param db_url: (optional) Database URL. If not defined, it looks for an Airflow configuration file. * return: Provenance ID.
扫描文件夹以填充数据库:
def visit(folder, provenance_id, step_name, previous_step_id, config, db_url) Record all files from a folder into the database. The files are listed in the DB. If a file has been copied from previous step without any transformation, it will be detected and marked in the DB. The type of file will be detected and stored in the DB. If a files (e.g. a DICOM file) contains some meta-data, those will be stored in the DB. * param folder: folder path. * param provenance_id: provenance label. * param step_name: Name of the processing step that produced the folder to visit. * param previous_step_id: (optional) previous processing step ID. If not defined, we assume this is the first processing step. * param config: List of flags: - boost: (optional) When enabled, we consider that all the files from a same folder share the same meta-data. When enabled, the processing is (about 2 times) faster. This option is enabled by default. - session_id_by_patient: Rarely, a data set might use study IDs which are unique by patient (not for the whole study). E.g.: LREN data. In such a case, you have to enable this flag. This will use PatientID + StudyID as a session ID. - visit_id_in_patient_id: Rarely, a data set might mix patient IDs and visit IDs. E.g. : LREN data. In such a case, you have to enable this flag. This will try to split PatientID into VisitID and PatientID. - visit_id_from_path: Enable this flag to get the visit ID from the folder hierarchy instead of DICOM meta-data (e.g. can be useful for PPMI). - repetition_from_path: Enable this flag to get the repetition ID from the folder hierarchy instead of DICOM meta-data (e.g. can be useful for PPMI). * param db_url: (optional) Database URL. If not defined, it looks for an Airflow configuration file. * param is_organised: (optional) Disable this flag when scanning a folder that has not been organised yet (should only affect nifti files). * return: return processing step ID.
建造
运行./build.sh。(为Python3构建)
(这包括基于 自述.md)
测试
输入tests目录。
带Docker
运行./test.sh
不带Docker
- 在localhost:5432上运行postgres数据库。
- 运行nosetest unit_test.py
在pypi上发布
运行./publish.sh。
(这将在推动pypi之前构建项目)
注意:不要忘记更新setup.py之前版本号 去出版。
注释
- 此项目包含对git子模块的引用。你可以使用 --recursive克隆项目以克隆子模块时的标志 我也是。
致谢
这项工作由欧洲联盟第七框架资助。 第604102号授予协议(HBP)项下的计划(FP7/2007-2013)
这项工作是人类大脑项目(SGA1)SP8的一部分