Python mvtest包_程序模块 - PyPI

['gwas分析']

mvtest的Python项目详细描述

mvtest-gwas分析

安装
- 使用PIP安装
- 手动安装
- 系统要求
- 运行单元测试
- 虚拟环境
- 微秒
什么是mvtest？
- 文档
- 命令行参数
mvmany助手脚本
- 默认模板
- 命令行参数
开发说明
- mvtest作者
- 更改日志
更改日志

安装

MVTest需要Python2.7.x以及以下库：

numpy（1.7.2或更高版本）www.numpy.org
scipy（0.13.2或更高版本）www.scipy.org

mvtest的安装将尝试安装这些必需的但是，组件要求您对安装目录的权限。如果您使用共享的系统，并且缺乏安装库和软件自己，你应该看看其中一个部分，下面的miniconda或virtual env可获取有关不同选项的说明设置你自己的Python环境，它将完全存在在你自己的控制之下。

安装有两种方法：

使用pip

安装

要使用python的包管理器pip进行安装，只需使用以下命令：

$pip安装mvtest

如果您有安装软件包的适当权限，则将尝试下载并安装所有依赖项以及mvtest本身。

手动安装

对于不使用pip或希望运行捆绑测试的用户与本地手册一样，手动安装几乎与容易的。

对于安装了git的用户，只需使用以下命令：

$git克隆https://github.com/edwards-lab/mvtest

或者你可以访问网站直接从 github:https://github.com/edwards-lab/MVtest

下载软件后，只需提取内容并运行以下命令进行安装：

$python setup.py安装

如果未报告错误，则应安装并准备使用。

关于python 3我开始了将代码更新为使用python版本2和版本3，但是版本3的某些库支持问题令人沮丧。所以，在这些问题解决之前，我没有进一步投资的计划是时候支持Python3了。

系统要求

除了库依赖关系之外，mvtest的需求还依赖于很大程度上取决于被分析的snp和个体的数量作为正在使用的数据格式。一般来说，gwas大小的数据集使用传统的然而，即使是10万个受试者的系谱格式也可以是当数据格式化时，用小于1GB的RAM进行分析作为转置系谱或普林克的默认床格式。

否则，建议在类似unix的系统，如Linux或OS X，但它也应该在Windows下工作（我们不支持在windows下运行mvtest）。

运行单元测试

mvtest附带了一个单元测试套件，可以在安装。要运行测试，只需从在提取的存档内容的根目录中：

$python setup.py测试

如果没有报告错误，那么mvtest应该在系统。

虚拟环境

对于python程序员和最终用户来说，virtual env也是一个强大的工具同样，它允许用户部署不同版本的python 没有的应用程序但是需要根用户访问机器。

因为mvtest需要2.7版，所以您需要确保机器的python版本符合要求。虚拟环境主要使用 python的系统版本，但是创建了一个用户拥有的环境包装器允许用户轻松地安装库对计算机的管理权限。

有关Virtualenv的有用介绍，请查看教程：http://www.simononsoftware.com/virtualenv-tutorial/

微秒

miniconda是蟒蛇分布。它使创建本地使用最新版本的为没有根访问权限的用户提供科学图书馆目标机器。基本上，当你用Miniconda的时候将自己版本的python安装到控件，该控件允许您在没有必须提交一张帮助台的行政协助票。

与pip不同，conda发行版背后的人员提供二进制下载选定的库组件。因此，只有流行的库，如pip、numpy和scipy，由康达自己。但是，这些文件不需要编译，可能是比单独使用pip更容易安装。我经历过很难在集群上通过pip和安装工具安装scipy 在范德比尔特，由于某些要求的非标准路径组件，但是mini conda总是通过的。

首先，下载并安装适当版本的miniconda 项目网站。请确保选择Python2版本： http://conda.pydata.org/miniconda.html

在进行安装时，请允许它更新路径信息。如果您不希望总是使用此版本的 python在未来，简单地告诉它不要更新您的.bashrc文件并注意装载和卸载新python的说明环境。请注意，即使您选择更新.bashrc 文件中，您需要按照说明将更改加载到你现在的弹壳。

一旦这些更改生效，请安装setuptools和scipy:。$ conda安装pip scipy

安装scipy还将强制安装numpy 也需要运行mvtest。（安装工具包括简易安装）。

一旦成功完成，您应该准备好按照安装mvtest的标准说明进行操作。

什么是mvtest？

todo:编写有关应用程序和这是科学依据。

文件

mvtest的文档仍在构建中。然而，应用程序使用标准unix帮助提供合理的内联帮助参数：

>；mvtest.py-h

或

>；mvtest.py–帮助

一般来说，重叠功能应该模仿plink的功能。

命令行参数

mvtest使用的命令行参数通常与 plink，除非没有匹配的功能（或功能差别很大。）

对于下面列出的参数，当参数需要值时，该值必须在参数后面用一个空格分隔两个（没有“=”符号）。对于没有指定值的标志，传递标志表示条件将被“激活”。

当“type”列中没有列出值时，参数是 off默认，当参数存在时on（即默认情况下，压缩被关闭，除非标志， –已提供压缩功能。）

获取帮助
`-h, --help` Show this help message and exit.
`-v` Print version number

输入数据

mvtest尝试在适当的地方模拟plink的接口。

所有输入文件es应该用空格分隔。对于基于文本的等位基因注释1 2和a c g t注释就足够了。所有数据必须以等位基因表达，而不是基因型（除了输入输出，这是一种不同于其他格式的特殊格式表格）。

对于系谱、转置系谱和plink二元系谱文件，使用前缀参数就足够了，如果您的文件遵循标准命名约定。

系谱数据
完全支持系谱数据，但不建议这样做。什么时候？加载谱系数据时，mvtest必须将整个数据集加载到内存中在分析之前，这会导致大量的内存不必要的开销。
像这样的标志-没有酚和-没有性可以用于任何组合创建具有高度灵活的头结构的地图文件。
`--file <prefix>`
(filename prefix) Prefix for .ped and .map files
`--ped <filename>`
PLINK compatible .ped file
`--map <filename>`
PLink compatible .map file
`--map3` Map file has only 3 columns
`--no-sex` Pedigree file doesn’t have column 5 (sex)
`--no-parents` Pedigree file doesn’t have columns 3 and 4 (parents)
`--no-fid` Pedgiree file doesn’t have column 1 (family ID)
`--no-pheno` Pedigree file doesn’t have column 6 (phenotype)
`--liability` Pedigree file has column 7 (liability)

plink二元谱系
此格式表示大型gwa最有效的存储数据集，可直接由mvtest使用。除了一个最小的在头顶上，由于高效的磁盘布局。
`--bfile <prefix>`
(filename prefix) <prefix> for .bed, .bim and .fam files
`--bed <filename>`
Binary Ped file(.bed)
`--bim <filename>`
Binary Ped marker file (.bim)
`--fam <filename>`
Binary Ped family file (.fam)

转座系谱数据
转座系谱数据与标准系谱相似，只是数据的排列方式使得数据以行的形式组织为snp，而不是个人。这允许mvtest运行它的分析而不是将整个数据集加载到内存中。
`--tfile <prefix>`
Prefix for .tped and .tfam files
`--tped <filename>`
Transposed Pedigre file (.tped)
`--tfam <filename>`
Transposed Pedigree Family file (.tfam)

系谱/转置系谱常见标志

默认情况下，假定系谱和转置系谱数据为未压缩。但是，如果它们有一个扩展名。tgz加上了–compressed 争论。

--compressed

Indicate that ped/tped files have been compressed with gzip and are named with extensions such as .ped.tgz and .tped.tgz

输入输出

MVTest在执行分析时不调用基因型，并且允许用户定义分析数据时要使用的模型。由于事实上染色体在输入文件，mvtest要求用户提供染色体，输入输入文件和每个输入输出对应的.info文件。

由于预期的基因座数量巨大，mvtest允许用户指定要分析的偏移量和文件计数。这是为了允许用户在群集上同时运行多个作业并单独工作关于单独的输入区域文件。用户甚至可以分割这些区域进一步使用标准的mvtest区域选择。

默认情况下，假设所有输入的数据都是使用gzip压缩的。

默认命名约定是输入数据文件以.gen.gz结尾。信息文件除了结尾是替换为.info。

`--impute <filename>`
	File containing list of impute output for analysis
`--impute-fam <filename>`
	File containing family details for impute data
`--impute-offset <integer>`
	Impute file index (1 based) to begin analysis
`--impute-count <integer>`
	Number of impute files to process (for this node). Defaults to all remaining.
`--impute-uncompressed`
	Indicate that the impute input is not gzipped, but plain text
`--impute-encoding`
	(additive,dominant or recessive) Genetic model to be used when analyzing imputed data.
`--impute-info-ext <extension>`
	Portion of filename denotes info filename
`--impute-gen-ext <extension>`
	Portion of filename that denotes gen file
`--impute-info-thresh <float>`
	Threshold for filtering imputed SNPs with poor ‘info’ values

输入文件
当对输入输出进行分析时，用户必须提供列出要分析的每个gen文件的单个文件。这个纯文本文件为每个gen包含2列（或可选3列）文件：
Chromosome Gen File .info <filename> (optional)
N <filename> <filename>
… … …
只有当.info文件和.gen文件是除了扩展名<；不同。

Chromosome	Gen File	.info <filename> (optional)
N	<filename>	<filename>
…	…	…

马赫输出

用户可以分析用mach输入的数据。因为大多数情况下需要多个文件，格式为单个文件，其中包含剂量/信息文件对，或者如果两个文件共享相同文件名，除了扩展名，每行一个剂量文件。

重要提示：mach不提供任何存储染色体的地方，而且: 位置。用户可能希望将此信息嵌入第一个 .info文件中的列。这样做将允许mvtest 识别这些值并填充报告。要使用此功能，用户经常使用–mach chrpos字段并且.info文件中的id列必须在以下方式：chr:pos（可选：rsid）当–mach chrpos标志时使用，mvtest当它遇到不在这个格式，并且必须至少有2个“字段”（即必须在最少一个“：”字符。处理马赫估算数据时这种特殊的id编码，mctest将无法识别位置。因此，除非存在–mach chrpos标志，如果用户尝试使用，MVTest将退出一个错误。位置过滤器，如–来自BP、–CHR等。

当在集群上使用mach剂量运行mvtest时，用户可以指示一种给定的工作，用于分析所包含的部分文件中的数据通过更改–mach offset和 –马赫数参数。默认情况下，偏移量从1开始（剂量列表中的第一个文件）并运行它找到的所有文件。但是，如果我们想把这些工作分成三份来分析每个 Job，他们可能会将这些值设置为-Mach Offset 1–Mach Count 3或 –马赫偏移4–马赫数3取决于正在执行的作业定义。

为了最小化内存需求，mach剂量文件可以以递增方式加载，使得只有n个位点存储在时间。这可以使用–mach chunk size参数来控制。这个这个数字越大，mvtest运行的速度就越快（读取的次数就越少但是需要更多的内存。

`--mach <filename>`
	File containing list of dosages, one per line. Optionally, lines may contain the info names as well (separated by whitespace) if the two <filename>s do not share a common base name.
`--mach-offset <integer>`
	Index into the MACH file to begin analyzing
`--mach-count <integer>`
	Number of dosage files to analyze
`--mach-uncompressed`
	By default, MACH input is expected to be gzip compressed. If data is plain text, add this flag. It should be noted that dosage and info files should be either both compressed or both uncompressed.
`--mach-chunk-size <integer>`
	Due to the individual orientation of the data, large dosage files are parsed in chunks in order to minimize excessive memory during loading
`--mach-info-ext <extension>`
	Indicate the <extension> used by the mach info files
`--mach-dose-ext <extension>`
	Indicate the <extension> used by the mach dosage files
`--mach-min-rsquared <float>`
	Indicate the minimum threshold for the rsqured value from the .info files required for analysis.
`--mach-chrpos`	When set, MVtest expects IDs from the .info file to be in the format chr:pos:rsid (rsid is optional). This will allow the report to contain positional details, otherwise, only the RSID column will have a value which will be the contents of the first column from the .info file

马赫文件输入
在对马赫输出进行分析时，用户必须提供列出每个剂量文件和（可选）匹配的 .info文件。这个文件是一个简单的文本文件，有1列剂量文件名）或2（剂量文件名后跟信息文件名以空格分隔）。
仅当文件名不相同时才需要第2列除了分机。
Col 1 (dosage <filename>) Col 2 (optional info <filename>)
<filename>.dose <filename>.info
… …

Col 1 (dosage <filename>)	Col 2 (optional info <filename>)
<filename>.dose	<filename>.info
…	…

表型/协变量数据
表型和协变量数据可以在标准中找到系谱头或在特殊的plink样式协变量文件中。用户可以使用头名称指定表型和协变量（如果标头存在于文件中或由1个列索引组成。指数为1 实际上是指第一个变量列，而不是第一列。在一般来说，这将是第3列，因为第1列和第2列引用 fid和iid。
`--pheno <filename>`
File containing phenotypes. Unless –all-pheno is present, user must provide either index(s) or label(s) of the phenotypes to be analyzed.
`--mphenos LIST` Column number(s) for phenotype to be analyzed if number of columns > 1. Comma separated list if more than one is to be used.
`--pheno-names LIST`
Name for phenotype(s) to be analyzed (must be in –pheno file). Comma separated list if more than one is to be used.
`--covar <filename>`
File containing covariates
`--covar-numbers LIST`
Comma-separated list of covariate indices
`--covar-names LIST`
Comma-separated list of covariate names
`--sex` Use sex from the pedigree file as a covariate
`--missing-phenotype CHAR`
Encoding for missing phenotypes as can be found in the data.
`--all-pheno` When present, mv-test will run each phenotypes found inside the phenotype file.

分析的限制区域
当指定要分析的位置范围时，染色体必须在场。如果染色体是特定的，但没有伴随范围，将使用整个染色体。只有一个范围可以是每次运行指定。
通常，在指定区域限制时，必须定义-chr 除非使用通用的马赫输入（它没有定义染色体数量或位置，在这种情况下，位置限制不应用）。
`--snps LIST` Comma-delimited list of SNP(s): rs1,rs2,rs3-rs6
`--chr <integer>`
Select Chromosome. If not selected, all chromosomes are to be analyzed.
`--from-bp <integer>`
SNP range start
`--to-bp <integer>`
SNP range end
`--from-kb <integer>`
SNP range start
`--to-kb <integer>`
SNP range end
`--from-mb <integer>`
SNP range start
`--to-mb <integer>`
SNP range end
`--exclude LIST` Comma-delimited list of rsids to be excluded
`--remove LIST`
Comma-delimited list of individuals to be removed from analysis. This must
be in the form of family_id:individual_id
`--maf <float>` Minimum MAF allowed for analysis
`--max-maf <float>`
MAX MAF allowed for analysis
`--geno <integer>`
MAX per-SNP missing for analysis
`--mind <integer>`
MAX per-person missing
`--verbose` Output additional data details in final report

mvmany助手脚本

除了分析程序mvtest.py，一个帮助脚本，还包括mvmany.py，可用于将大型作业拆分为适合在计算集群上运行的较小的。用户只是运行mvmany.py就像运行mvtest.py一样，但是有几个附加参数，mvmany.py将构建多个作业脚本在多个节点上运行作业。它记录了传递给它并将它们写入生成的脚本。

需要注意的是，mvmany.py只是生成集群脚本，不提交它们。

默认模板

首次运行mvmany.py时，它将生成默认值的副本用户主目录中名为.mv-many.template的模板。此模板用于定义将要写入的作业详细信息到每个作业脚本。默认情况下，模板配置为虽然集群软件不完善，但可以很容易地更改为与任何类似于slurm作业管理器的集群软件，例如作为TORQUE/PBS或SUNGRID。

除了能够替换预处理器定义使用不同的群集管理器软件，用户还可以添加特定于用户的定义，如电子邮件通知或帐户规范，为用户提供运行不同系统配置下的软件。

示例模板（slurm）

示例模板可能如下所示

#!/bin/bash #SBATCH –job-name=$jobname #SBATCH –nodes=1 #SBATCH –tasks-per-node=1 #SBATCH –cpus-per-task=1 #SBATCH –mem=$memory #SBATCH –time=$walltime #SBATCH –error $logpath/$jobname.e #SBATCH –output $respath/$jobname.txt
cd $pwd
$body

需要注意的是，这个文本块包含 slurm预处理器设置（例如sbatch–作业名称）以及将用适当的值替换的变量（例如 $jobName被替换为一个字符串，该字符串对此是唯一的特殊工作）。每个集群类型都有自己的语法来设置必要变量，假设用户知道如何正确编辑默认模板以满足其需要。

示例扭矩模板
例如，要在基于torque的集群上使用这些脚本，一个可能会将~/.mvmany.template更新为以下内容
#!/bin/bash #PBS -N $jobname #PBS -l nodes=1 #PBS -l ppn=1 #PBS -l mem=$memory #PBS -l walltime=$walltime #PBS -e $logpath/$jobname.e #PBS -o $respath/$jobname.txt
cd $pwd
$body
请注意，并非所有slurm设置都直接映射到pbs 设置，用户需要了解如何正确地配置其群集作业头。
一般来说，用户应该确保每个变量正确定义，以便将相应的值写入最后的作业脚本。以下变量根据正在执行的作业和传递给用户编程（或其默认值）：
Variable Purpose
$jobname Unique name for the current job
$memory (2G) Amount of memory to provide each job.
$walltime (3:00:00) Define amount of time to be assigned to jobs
$logpath Directory specified for writing logs
$respath Directory sepcified for writing results
$pwd current working dir when mvmany is run
$body Statements of execution

Variable	Purpose
$jobname	Unique name for the current job
$memory (2G)	Amount of memory to provide each job.
$walltime (3:00:00)	Define amount of time to be assigned to jobs
$logpath	Directory specified for writing logs
$respath	Directory sepcified for writing results
$pwd	current working dir when mvmany is run
$body	Statements of execution

命令行参数

py公开以下附加参数，以便在运行脚本。

`--mv-path PATH`	Set path to mvtest.py if it’s not in PATH
`--logpath PATH`	Path to location of job’s error output

–恢复路径作业结果的位置路径

`--script-path PATH`
	Path for writing script files
`--template FILENAME`
	Specify a template other than the default
`--snps-per-job INTEGER`
	Specify the number of SNPs to be run at one time
`--mem STRING`	Specify the amount of memory to be requested for each job
`--wall-time`	Specify amount of time to be requested for each job

选项-mem取决于以及要使用的可配置选项。用户应该执行基本测试运行以确定其作业的正确设置。默认情况下，使用2g，这对于二元系谱，输入和转位系谱。其他的会有所不同很大程度上取决于数据集的大小和使用的设置。

选项“墙时间”很大程度上取决于机器，但会有所不同基于实际数据集的大小和数据的完整性。用户应进行现场测试以确定合理值。默认情况下，所要求的墙时间为3天，这对于gwas来说是足够的数据集，但可能不足以满足整个exome数据集所需时间将取决于有多少snp 由任何给定节点分析。

通常，mvmany.py接受mvtest.py接受的所有参数，除了那些由 mvmany.py本身。这些参数包括以下参数

–chr –snps –from-bp –to-bp –from-kb –to-kb –from-mb –to-mb

查看mvmany.py可以使用的参数的综合列表只需询问程序本身

mvmany.py –help

用户可以让mvmany将某些类型的作业分解为多个部分，然后可以指定每个作业要运行的独立命令数。在这里 time，mvmany.py假设插补数据已经被分割成片段，不支持在上运行单个文件的部分多个节点。

生成的结果可以在所有节点都已完成执行。

变更日志

mvtest.py:1.0.4

修正了一个与同时运行多个表型相关的错误。

mvtest.py:1.0.3

删除了对马赫输入（chr:pos）的特殊要求，并使其成为可选的。

mvtest.py:1.0.2

在使用格式不正确的马赫信息文件时添加了异常
更新文档以提请注意附加的马赫信息文件要求

mvtest.py:1.0.1发布

对setup.cfg和setup的更改。py以适应对gh页面所做的更改。

mvtest.py:1.0.0发布

欢迎加入QQ群-->： 979659372

`-h, --help`	Show this help message and exit.
`-v`	Print version number

`--file <prefix>`
	(filename prefix) Prefix for .ped and .map files
`--ped <filename>`
	PLINK compatible .ped file
`--map <filename>`
	PLink compatible .map file
`--map3`	Map file has only 3 columns
`--no-sex`	Pedigree file doesn’t have column 5 (sex)
`--no-parents`	Pedigree file doesn’t have columns 3 and 4 (parents)
`--no-fid`	Pedgiree file doesn’t have column 1 (family ID)
`--no-pheno`	Pedigree file doesn’t have column 6 (phenotype)
`--liability`	Pedigree file has column 7 (liability)

`--bfile <prefix>`
	(filename prefix) <prefix> for .bed, .bim and .fam files
`--bed <filename>`
	Binary Ped file(.bed)
`--bim <filename>`
	Binary Ped marker file (.bim)
`--fam <filename>`
	Binary Ped family file (.fam)

`--tfile <prefix>`
	Prefix for .tped and .tfam files
`--tped <filename>`
	Transposed Pedigre file (.tped)
`--tfam <filename>`
	Transposed Pedigree Family file (.tfam)

`--pheno <filename>`
	File containing phenotypes. Unless –all-pheno is present, user must provide either index(s) or label(s) of the phenotypes to be analyzed.
`--mphenos LIST`	Column number(s) for phenotype to be analyzed if number of columns > 1. Comma separated list if more than one is to be used.
`--pheno-names LIST`
	Name for phenotype(s) to be analyzed (must be in –pheno file). Comma separated list if more than one is to be used.
`--covar <filename>`
	File containing covariates
`--covar-numbers LIST`
	Comma-separated list of covariate indices
`--covar-names LIST`
	Comma-separated list of covariate names
`--sex`	Use sex from the pedigree file as a covariate
`--missing-phenotype CHAR`
	Encoding for missing phenotypes as can be found in the data.
`--all-pheno`	When present, mv-test will run each phenotypes found inside the phenotype file.

`--snps LIST`	Comma-delimited list of SNP(s): rs1,rs2,rs3-rs6
`--chr <integer>`
	Select Chromosome. If not selected, all chromosomes are to be analyzed.
`--from-bp <integer>`
	SNP range start
`--to-bp <integer>`
	SNP range end
`--from-kb <integer>`
	SNP range start
`--to-kb <integer>`
	SNP range end
`--from-mb <integer>`
	SNP range start
`--to-mb <integer>`
	SNP range end
`--exclude LIST`	Comma-delimited list of rsids to be excluded
`--remove LIST`	Comma-delimited list of individuals to be removed from analysis. This must be in the form of family_id:individual_id
`--maf <float>`	Minimum MAF allowed for analysis
`--max-maf <float>`
	MAX MAF allowed for analysis
`--geno <integer>`
	MAX per-SNP missing for analysis
`--mind <integer>`
	MAX per-person missing
`--verbose`	Output additional data details in final report

mvtest 1.0.5

mvtest的Python项目详细描述

mvtest-gwas分析

安装

使用pip

手动安装

系统要求

运行单元测试

虚拟环境

微秒

什么是mvtest？

文件

命令行参数

获取帮助 -h, --helpShow this help message and exit.-vPrint version number

输入数据

系谱/转置系谱常见标志

输入输出

马赫输出

mvmany助手脚本

默认模板

示例模板（slurm）

命令行参数

变更日志

推荐PyPI第三方库

ppsqlviz

nssh

bdownload

fbctl

agrc-sweeper

xlsx2pdf

flasktalisman

distributionslet

snowoptics

distributed-prob

pytiledparser

torchinceptionresnetv2

tictactoeadarbha

hy015removed

django-runscript

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

获取帮助
`-h, --help` Show this help message and exit.
`-v` Print version number

导航栏

项目链接

标签