基因组学的机器学习

genoml的Python项目详细描述


Genoml核心

genoml是一种用于基因组学的自动机器学习(automl)。这是Genoml的核心包。 请注意,此回购协议正在为“功能测试”开发中。这不是最终产品,只是软件逻辑的初步评估。 包装网站和正在进行的文档:https://genoml.github.io

目标

请在不同的环境和不同的数据集上测试代码。目标是解决以下问题:

  • 依赖项:是否需要安装任何尚未安装的包listed?有依赖性错误吗?
  • 错误:有错误吗?错误不清楚吗?
  • 错误输出:预期输出中有任何差异吗?
  • corner cases:您正在测试的特定案例的代码是否中断?
  • 可用性:我们可以改进用户与代码交互的方式吗?有什么特别的功能或文件适合你吗?

安装

现在只需下载或克隆genoml core repo并使用python genoml.py运行genoml。

逐步示例

请参阅以下运行gonml的快速示例(有关完整的usage,请参阅Usage):

步骤1-genoml数据修剪:

仅对genotypephenotype数据执行data-prune

python genoml.py data-prune --geno-prefix=./exampleData/training --pheno-file=./exampleData/training.pheno

genotypephenotypeGWAScovariance数据执行data-prune

python genoml.py data-prune --geno-prefix=./exampleData/training --pheno-file=./exampleData/training.pheno  --gwas-file=./exampleData/example_GWAS.txt  

genotypephenotypeGWAScovarianceadditional数据执行data-prune

python genoml.py data-prune --geno-prefix=./exampleData/training --pheno-file=./exampleData/training.pheno --cov-file=./exampleData/training.cov --gwas-file=./exampleData/example_GWAS.txt --addit-file=./exampleData/training.addit  

genotypephenotypeGWASadditional数据以及Heritability estimate执行data-prune

python genoml.py data-prune --geno-prefix=./exampleData/training --pheno-file=./exampleData/training.pheno  --gwas-file=./exampleData/example_GWAS.txt --addit-file=./exampleData/training.addit --herit=0.2  

genotypephenotypeGWAScovariance、和additional数据以及Heritability estimate执行data-prune

python genoml.py data-prune --geno-prefix=./exampleData/training --pheno-file=./exampleData/example.pheno --cov-file=./exampleData/training.cov --gwas-file=./exampleData/example_GWAS.txt --addit-file=./exampleData/training.addit --herit=0.5 

第2步-Genoml模型列车:

data-prune的输出执行model-train,并使用prune步骤中给定的前缀prune-prefix=./tmp/20181225-230052

python genoml.py model-train --prune-prefix=./tmp/20181225-230052 --pheno-file=./exampleData/training.pheno  

步骤3-Genoml模型调整:

data-prune的输出在model-train之后执行model-tune,前缀来自修剪步骤prune-prefix=./tmp/20181225-230052

python genoml.py model-tune --prune-prefix=./tmp/20181225-230052 --pheno-file=./exampleData/training.pheno

步骤4-Genoml模型验证:

仅当存在genotypephenotype数据时才执行外部model-validate

python genoml.py model-validate --prune-prefix=./tmp/20181225-230052 --geno-prefix=./exampleData/training --pheno-file=./exampleData/training.pheno --valid-geno-prefix=./exampleData/validation --valid-pheno-file=./exampleData/validation.pheno

当存在genotypephenotypeGWAS数据时执行外部model-validate

python genoml.py model-validate --prune-prefix=./tmp/20181225-230052 --geno-prefix=./exampleData/training --pheno-file=./exampleData/training.pheno --valid-geno-prefix=./exampleData/validation --valid-pheno-file=./exampleData/validation.pheno --gwas-file=./exampleData/example_GWAS.txt

当存在genotypephenotypeGWASadditional数据时执行外部model-validate

python genoml.py model-validate --prune-prefix=./tmp/20181225-230052 --geno-prefix=./exampleData/training --pheno-file=./exampleData/training.pheno --valid-geno-prefix=./exampleData/validation --valid-pheno-file=./exampleData/validation.pheno --gwas-file=./exampleData/example_GWAS.txt --valid-addit-file=./exampleData/validation.addit

当存在genotypephenotypeGWASadditionalcovariance数据时执行外部model-validate

python genoml.py model-validate --prune-prefix=./tmp/20181225-230052 --geno-prefix=./exampleData/training --pheno-file=./exampleData/training.pheno --valid-geno-prefix=./exampleData/validation --valid-pheno-file=./exampleData/validation.pheno --gwas-file=./exampleData/example_GWAS.txt --valid-addit-file=./exampleData/validation.addit --valid-cov-file=./exampleData/validation.cov

用法

全基因组使用:

 Usage:
   genoml data-prune  (--geno-prefix=geno_prefix) (--pheno-file=<pheno_file>) [--gwas-file=<gwas_file>] [--cov-file=<cov_file>] [--herit=<herit>] [--addit-file=<addit_file>] [--temp-dir=<directory>]
   genoml model-train (--prune-prefix=prune_prefix) (--pheno-file=<pheno_file>) [--n-cores=<n_cores>] [--train-speed=<train_speed>] [--cv-reps=<cv_reps>] [--grid-search=<grid_search>] [--impute-data=<impute_data>]
   genoml model-tune (--prune-prefix=prune_prefix) (--pheno-file=<pheno_file>) [--cv-reps=<cv_reps>] [--grid-search=<grid_search>] [--impute-data=<impute_data>] [--best-model-name=<best_model_name>]
   genoml model-validate (--prune-prefix=prune_prefix) (--pheno-file=<pheno_file>) (--geno-prefix=geno_prefix) (--valid-geno-prefix=valid_geno_prefix) (--valid-pheno-file=<valid_pheno_file>) [--valid-cov-file=<valid_cov_file>] [--gwas-file=<gwas_file>] [--valid-addit-file=<valid_addit_file>] [--n-cores=<n_cores>] [--impute-data=<impute_data>]  [--best-model-name=<best_model_name>]
   genoml -h | --help
   genoml --version

 Options:
   --geno-prefix=geno_prefix               Prefix with path to genotype files in PLINK format, *.bed, *.bim and *.fam.
   --pheno-file=<pheno_file>               Path to the phenotype file in PLINK format, *.pheno.
   --gwas-file=<gwas_file>                 Path to the GWAS file, if available.
   --cov-file=<cov_file>                   Path to the covariance file, if available.
   --herit=<herit>                         Heritability estimate of phenotype between 0 and 1, if available.
   --addit-file=<addit_file>               Path to the additional file, if avialable.
   --temp-dir=<directory>                  Directory for temporary files [default: ./tmp/].
   --n-cores=<n_cores>                     Number of cores to be allocated for computation [default: 1].
   --prune-prefix=prune_prefix             Prefix given to you at the end of pruning stage.
   --train-speed=<train_speed>             Training speed: (ALL, FAST, FURIOUS, BOOSTED). Run all models, only  the fastest models, run slightly slower models, or just run boosted models which usually perform best when using genotype data [default: BOOSTED].
   --cv-reps=<cv_reps>                     Number of cross-validation. An integer greater than 5. Effects the speed [default: 5].
   --impute-data=<impute_data>             Imputation: (knn, median). Governs secondary imputation and data transformation [default: median].
   --grid-search=<grid_search>             Grid search length for parameters, integer greater than 10, 30 or greater recommended, effects speed of initial tune [default: 10].
   --best-model-name=<best_model_name>     Name for the best model [default: best_model].
   --valid-geno-prefix=valid_geno_prefix   Prefix with path to the validation genotype files in PLINK format, *.bed, *.bim and *.fam.
   --valid-pheno-file=<valid_pheno_file>   Path to the validation phenotype file in PLINK format, *.pheno.
   --valid-cov-file=<valid_cov_file>       Path to the validation covariance file, if available.
   --valid-addit-file=<valid_addit_file>   Path to the the validation additional file, if avialable.
   -h --help                               Show this screen.
   --version                               Show version.

 Examples:
   genoml data-prune --geno-prefix=./exampleData/example --pheno-file=./exampleData/training.pheno
   genoml data-prune --geno-prefix=./exampleData/example --pheno-file=./exampleData/training.pheno  --gwas-file=./exampleData/example_GWAS.txt
   genoml data-prune --geno-prefix=./exampleData/example --pheno-file=./exampleData/training.pheno --cov-file=./exampleData/training.cov --gwas-file=./exampleData/example_GWAS.txt --addit-file=./exampleData/training.addit
   genoml data-prune --geno-prefix=./exampleData/example --pheno-file=./exampleData/training.pheno  --gwas-file=./exampleData/example_GWAS.txt --addit-file=./exampleData/training.addit --herit=0.2
   genoml data-prune --geno-prefix=./exampleData/example --pheno-file=./exampleData/training.pheno --cov-file=./exampleData/training.cov --gwas-file=./exampleData/example_GWAS.txt --addit-file=./exampleData/training.addit --herit=0.5
   genoml model-train --prune-prefix=./tmp/20181225-230052 --pheno-file=./exampleData/training.pheno
   genoml model-tune --prune-prefix=./tmp/20181225-230052 --pheno-file=./exampleData/training.pheno
   genoml model-validate --prune-prefix=./tmp/20181225-230052 --pheno-file=./exampleData/training.pheno --valid-geno-prefix=./exampleData/validation --valid-pheno-file=./exampleData/validation.pheno

 Help:
   For help using this tool, please open an issue on the Github repository:
   https://github.com/GenoML/genoml-core/issues

报告问题

请在GenoML-core issues page上报告任何问题或建议。

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java有没有工具可以将zephyr转换为velocity模板?   java在安卓 studio中从JSON响应中获取值   jvm如何在Java中设计一个好的permgen空间字符串?   java如何防止Rest webservice使用被盗令牌进行身份验证   java无法遍历列表JSTL   找不到用于ResourceServerTokenServices的java Bean SpringSecurityOauth2   java子字符串替换问题   爪哇玻璃鱼3。十、 以编程方式处理任意HTTPSession的终止   java如何检查输入是否为整数,并在最后添加一个命令来重新启动while循环?   引发java ical4j 1.0.6不可解析日期异常   Java等价于Delphi的DBCtrlGrid?   如果发生错误,java将查找下一个预期标记ANTLR 3   java自打开应用程序(创建锁屏)   java为什么netty有自己的ConcurrentHashMap?   Gradle任务中的java拉取和运行依赖项   继承与Java继承的混淆   java使用shell脚本中的版本执行jar   java我无法让Sqlite数据库与带有Maven的JavaFX应用程序IDE Eclipse包正确通信   java控制台日志未通过org打印。阿帕奇。hadoop。mapreduce。作业的waitForCompletion(true)方法   JAVAlang.NoSuchMethodError:apachestorm螺栓中的spring getrequest