Python fmrib-unpack包_程序模块 - PyPI

fmrib ukbiobank标准化、解析和清洗工具包

fmrib-unpack的Python项目详细描述

https://img.shields.io/pypi/v/fmrib-unpack.svg

https://anaconda.org/conda-forge/fmrib-unpack/badges/version.svg

https://zenodo.org/badge/DOI/10.5281/zenodo.1997626.svg

https://git.fmrib.ox.ac.uk/fsl/funpack/badges/master/coverage.svg

funpack是一个用于预处理英国生物银行数据的python库。

FUNPACK is developed at the Wellcome Centre for Integrative Neuroimaging (WIN@FMRIB), University of Oxford. FUNPACK is in no way endorsed, sanctioned, or validated by the UK BioBank.
FUNPACK comes bundled with metadata about the variables present in UK BioBank data sets. This metadata can be obtained from the UK BioBank online data showcase

安装

通过PIP安装Funpack:

pip install fmrib-unpack

或来自conda-forge：

conda install -c conda-forge fmrib-unpack

入门笔记本

funpack_demo命令将启动一个jupyter笔记本，其中介绍 Funpack提供的主要功能。要运行它，您需要安装一些其他依赖项：

pip install fmrib-unpack[demo]

注意

如果在fsl环境中安装了funpack，则需要安装这样的笔记本依赖项（您可以需要管理权限）：

source $FSLDIR/fslpython/bin/activate fslpython
pip install fmrib-unpack[demo]

然后，可以通过运行funpack_demo启动演示。

注意

入门笔记本使用bash，因此不太可能使用窗户。

用法

一般用法如下：

funpack [options] output.tsv input1.tsv input2.tsv

您可以通过键入funpack --help获得所有选项的信息。

选项可以在命令行中指定，和/或存储在配置中文件。例如，以下命令行中的选项：

funpack \
  --overwrite \
  --import_all \
  --log_file log.txt \
  --icd10_map_file icd_codes.tsv \
  --category 10 \
  --category 11 \
  output.tsv input1.tsv input2.tsv

可以存储在配置文件config.txt：

overwrite
import_all
log_file       log.txt
icd10_map_file icd_codes.tsv
category       10
category       11

然后执行如下操作：

funpack -cfg config.txt output.tsv input1.tsv input2.tsv

功能

Funpack允许您执行各种数据清理和处理步骤关于您的数据，例如：

NA value replacement: Specific values for some columns can be replaced with NA, for example, variables where a value of -1 indicates Do not know.
Categorical recoding: Certain categorical columns can re-coded. For example, variables where a value of 555 represents half can be recoded so that 555 is replaced with 0.5.
Child value replacement: NA values within some columns which are dependent upon other columns may have values inserted based on the values of their parent columns.

有关功能的更全面概述，请参阅介绍性笔记本在Funpack中提供。

内置规则

Funpack包含大量的内置规则写入预处理英国生物银行数据变量。这些规则存储在以下文件：

^{tt7}$: Cleaning rules for data codings
^{tt8}$: Cleaning rules for individual variables
^{tt9}$: Processing steps
^{tt10}$: Variable categories

您可以使用fmrib配置文件来使用这些规则：

funpack -cfg fmrib output.tsv input.tsv

您可以根据需要自定义或替换这些文件。你也可以通过通过--variable_file将这些文件的您自己的版本发送到Funpack， --datacoding_file、--type_file、--processing_file和 --category_file命令行选项。Funpack将加载所有变量和数据编码文件，并将它们合并到单个表中包含每个变量的清理规则。

创建自己的规则文件

要在数据编码级别定义规则，请创建一个或多个.tsv文件包含数据编码id的ID列，以及列：

^{tt18}$: A comma-separated list of values to replace with NA
^{tt19}$ A comma-separated list of values to be replaced with corresponding values in ^{tt20}$.
^{tt20}$ A comma-separated list of replacement values for each of the values listed in ^{tt19}$.

要应用这些规则，请使用 --datacoding_file选项。它们将应用于所有变量使用文件中列出的数据编码。

要在变量级别定义规则，请创建一个或多个.tsv文件包含变量id的ID列，以及列：

^{tt18}$: As above
^{tt19}$ As above
^{tt20}$ As above
^{tt31}$: A comma-separated list of expressions on parent variables, defining conditions which should trigger child-value replacement.
^{tt32}$: A comma-separated list of values to insert into the variable when the corresponding expression in ^{tt31}$ evaluates to true.
^{tt34}$: A comma-separated list of cleaning functions to apply to the variable.

输出

Funpack的主要输出是一个纯文本制表符分隔的[*]\ 包含在清理和处理之后的输入数据，可能是删除了一些列，添加了新列。

如果使用--non_numeric_file选项，则主输出文件将仅包含数字列；非数字列将保存到单独的文件。

您可以使用自己选择的任何工具加载此输出文件，例如python， Matlab或Excel。也可以将输出传递回功能包。

[*]	You can change the delimiter via the ^{tt36}$ / ^{tt37}$ option.

将输出加载到Matlab

如果你使用的是Matlab，你有几个选项来加载Funpack 输出。最好的选项是^{tt38}$，它将加载列名，并且将同时处理非数字数据和缺少的值。使用readtablelike 所以：

data = readtable('out.tsv', 'FileType', 'text');

readtable函数返回一个^{tt41}$对象，该对象存储每个列作为单独的向量（或非数值列的单元格数组）。如果你只是对数字列感兴趣，您可以将其作为如下数组来检索：

data    = data(:, vartype('numeric'));
rawdata = data.Variables;

readtable函数将增强lly重命名列名以确保它们是有效的Matlab标识符。你可以找回原稿来自table对象的名称如下：

colnames        = data.Properties.VariableDescriptions;
colnames        = regexp(colnames, '''(.+)''', 'tokens', 'once');
empty           = cellfun(@isempty, colnames);
colnames(empty) = data.Properties.VariableNames(empty);
colnames        = vertcat(colnames{:});

如果使用了--description_file选项，则可以在各栏说明如下：

descs = readtable('descriptions.tsv', ...
                  'FileType', 'text', ...
                  'Delimiter', '\t',  ...
                  'ReadVariableNames',false);
descs = [descs; {'eid', 'ID'}];
idxs  = cellfun(@(x) find(strcmp(descs.Var1, x)), colnames, ...
                'UniformOutput', false);
idxs  = cell2mat(idxs);
descs = descs.Var2(idxs);

测试

要运行测试套件，您需要安装一些附加依赖项：

pip install fmrib-unpack[test]

然后可以使用pytest：

运行测试套件

pytest

引用

如果你想引用Funpack，请参考它的Zenodo page。

欢迎加入QQ群-->： 979659372

fmrib-unpack 1.4.1

fmrib-unpack的Python项目详细描述

安装

入门笔记本

用法

功能

内置规则

创建自己的规则文件

输出

将输出加载到Matlab

测试

引用

推荐PyPI第三方库

xswitch

clickpost

hmm-classifier

forests

nuaal

taiga-contrib-ldap-auth

utils-mini

ssti-ab-tool

awsKeyTest

Mambu

pyQtApp

typedast

pmghelper

linkedin-scraper

eastdetector

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

fmrib-unpack 1.4.1

fmrib-unpack的Python项目详细描述

安装

入门笔记本

用法

功能

内置规则

创建自己的规则文件

输出

将输出加载到Matlab

测试

引用

推荐PyPI第三方库

xswitch

clickpost

hmm-classifier

forests

nuaal

taiga-contrib-ldap-auth

utils-mini

ssti-ab-tool

awsKeyTest

Mambu

pyQtApp

typedast

pmghelper

linkedin-scraper

eastdetector

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签