基因组学的机器学习
genoml的Python项目详细描述
Genoml核心
genoml是一种用于基因组学的自动机器学习(automl)。这是Genoml的核心包。 请注意,此回购协议正在为“功能测试”开发中。这不是最终产品,只是软件逻辑的初步评估。 包装网站和正在进行的文档:https://genoml.github.io
目标
请在不同的环境和不同的数据集上测试代码。目标是解决以下问题:
- 依赖项:是否需要安装任何尚未安装的包listed?有依赖性错误吗?
- 错误:有错误吗?错误不清楚吗?
- 错误输出:预期输出中有任何差异吗?
- corner cases:您正在测试的特定案例的代码是否中断?
- 可用性:我们可以改进用户与代码交互的方式吗?有什么特别的功能或文件适合你吗?
安装
现在只需下载或克隆genoml core repo并使用python genoml.py
运行genoml。
逐步示例
请参阅以下运行gonml的快速示例(有关完整的usage
,请参阅Usage):
步骤1-genoml数据修剪:
仅对genotype
和phenotype
数据执行data-prune
:
python genoml.py data-prune --geno-prefix=./exampleData/training --pheno-file=./exampleData/training.pheno
对genotype
、phenotype
、GWAS
和covariance
数据执行data-prune
:
python genoml.py data-prune --geno-prefix=./exampleData/training --pheno-file=./exampleData/training.pheno --gwas-file=./exampleData/example_GWAS.txt
对genotype
、phenotype
、GWAS
、covariance
和additional
数据执行data-prune
:
python genoml.py data-prune --geno-prefix=./exampleData/training --pheno-file=./exampleData/training.pheno --cov-file=./exampleData/training.cov --gwas-file=./exampleData/example_GWAS.txt --addit-file=./exampleData/training.addit
对genotype
、phenotype
、GWAS
和additional
数据以及Heritability estimate
执行data-prune
:
python genoml.py data-prune --geno-prefix=./exampleData/training --pheno-file=./exampleData/training.pheno --gwas-file=./exampleData/example_GWAS.txt --addit-file=./exampleData/training.addit --herit=0.2
对genotype
、phenotype
、GWAS
、covariance
、和additional
数据以及Heritability estimate
执行data-prune
:
python genoml.py data-prune --geno-prefix=./exampleData/training --pheno-file=./exampleData/example.pheno --cov-file=./exampleData/training.cov --gwas-file=./exampleData/example_GWAS.txt --addit-file=./exampleData/training.addit --herit=0.5
第2步-Genoml模型列车:
对data-prune
的输出执行model-train
,并使用prune步骤中给定的前缀prune-prefix=./tmp/20181225-230052
:
python genoml.py model-train --prune-prefix=./tmp/20181225-230052 --pheno-file=./exampleData/training.pheno
步骤3-Genoml模型调整:
对data-prune
的输出在model-train
之后执行model-tune
,前缀来自修剪步骤prune-prefix=./tmp/20181225-230052
:
python genoml.py model-tune --prune-prefix=./tmp/20181225-230052 --pheno-file=./exampleData/training.pheno
步骤4-Genoml模型验证:
仅当存在genotype
和phenotype
数据时才执行外部model-validate
:
python genoml.py model-validate --prune-prefix=./tmp/20181225-230052 --geno-prefix=./exampleData/training --pheno-file=./exampleData/training.pheno --valid-geno-prefix=./exampleData/validation --valid-pheno-file=./exampleData/validation.pheno
当存在genotype
、phenotype
和GWAS
数据时执行外部model-validate
:
python genoml.py model-validate --prune-prefix=./tmp/20181225-230052 --geno-prefix=./exampleData/training --pheno-file=./exampleData/training.pheno --valid-geno-prefix=./exampleData/validation --valid-pheno-file=./exampleData/validation.pheno --gwas-file=./exampleData/example_GWAS.txt
当存在genotype
、phenotype
、GWAS
和additional
数据时执行外部model-validate
:
python genoml.py model-validate --prune-prefix=./tmp/20181225-230052 --geno-prefix=./exampleData/training --pheno-file=./exampleData/training.pheno --valid-geno-prefix=./exampleData/validation --valid-pheno-file=./exampleData/validation.pheno --gwas-file=./exampleData/example_GWAS.txt --valid-addit-file=./exampleData/validation.addit
当存在genotype
、phenotype
、GWAS
、additional
和covariance
数据时执行外部model-validate
:
python genoml.py model-validate --prune-prefix=./tmp/20181225-230052 --geno-prefix=./exampleData/training --pheno-file=./exampleData/training.pheno --valid-geno-prefix=./exampleData/validation --valid-pheno-file=./exampleData/validation.pheno --gwas-file=./exampleData/example_GWAS.txt --valid-addit-file=./exampleData/validation.addit --valid-cov-file=./exampleData/validation.cov
用法
全基因组使用:
Usage:
genoml data-prune (--geno-prefix=geno_prefix) (--pheno-file=<pheno_file>) [--gwas-file=<gwas_file>] [--cov-file=<cov_file>] [--herit=<herit>] [--addit-file=<addit_file>] [--temp-dir=<directory>]
genoml model-train (--prune-prefix=prune_prefix) (--pheno-file=<pheno_file>) [--n-cores=<n_cores>] [--train-speed=<train_speed>] [--cv-reps=<cv_reps>] [--grid-search=<grid_search>] [--impute-data=<impute_data>]
genoml model-tune (--prune-prefix=prune_prefix) (--pheno-file=<pheno_file>) [--cv-reps=<cv_reps>] [--grid-search=<grid_search>] [--impute-data=<impute_data>] [--best-model-name=<best_model_name>]
genoml model-validate (--prune-prefix=prune_prefix) (--pheno-file=<pheno_file>) (--geno-prefix=geno_prefix) (--valid-geno-prefix=valid_geno_prefix) (--valid-pheno-file=<valid_pheno_file>) [--valid-cov-file=<valid_cov_file>] [--gwas-file=<gwas_file>] [--valid-addit-file=<valid_addit_file>] [--n-cores=<n_cores>] [--impute-data=<impute_data>] [--best-model-name=<best_model_name>]
genoml -h | --help
genoml --version
Options:
--geno-prefix=geno_prefix Prefix with path to genotype files in PLINK format, *.bed, *.bim and *.fam.
--pheno-file=<pheno_file> Path to the phenotype file in PLINK format, *.pheno.
--gwas-file=<gwas_file> Path to the GWAS file, if available.
--cov-file=<cov_file> Path to the covariance file, if available.
--herit=<herit> Heritability estimate of phenotype between 0 and 1, if available.
--addit-file=<addit_file> Path to the additional file, if avialable.
--temp-dir=<directory> Directory for temporary files [default: ./tmp/].
--n-cores=<n_cores> Number of cores to be allocated for computation [default: 1].
--prune-prefix=prune_prefix Prefix given to you at the end of pruning stage.
--train-speed=<train_speed> Training speed: (ALL, FAST, FURIOUS, BOOSTED). Run all models, only the fastest models, run slightly slower models, or just run boosted models which usually perform best when using genotype data [default: BOOSTED].
--cv-reps=<cv_reps> Number of cross-validation. An integer greater than 5. Effects the speed [default: 5].
--impute-data=<impute_data> Imputation: (knn, median). Governs secondary imputation and data transformation [default: median].
--grid-search=<grid_search> Grid search length for parameters, integer greater than 10, 30 or greater recommended, effects speed of initial tune [default: 10].
--best-model-name=<best_model_name> Name for the best model [default: best_model].
--valid-geno-prefix=valid_geno_prefix Prefix with path to the validation genotype files in PLINK format, *.bed, *.bim and *.fam.
--valid-pheno-file=<valid_pheno_file> Path to the validation phenotype file in PLINK format, *.pheno.
--valid-cov-file=<valid_cov_file> Path to the validation covariance file, if available.
--valid-addit-file=<valid_addit_file> Path to the the validation additional file, if avialable.
-h --help Show this screen.
--version Show version.
Examples:
genoml data-prune --geno-prefix=./exampleData/example --pheno-file=./exampleData/training.pheno
genoml data-prune --geno-prefix=./exampleData/example --pheno-file=./exampleData/training.pheno --gwas-file=./exampleData/example_GWAS.txt
genoml data-prune --geno-prefix=./exampleData/example --pheno-file=./exampleData/training.pheno --cov-file=./exampleData/training.cov --gwas-file=./exampleData/example_GWAS.txt --addit-file=./exampleData/training.addit
genoml data-prune --geno-prefix=./exampleData/example --pheno-file=./exampleData/training.pheno --gwas-file=./exampleData/example_GWAS.txt --addit-file=./exampleData/training.addit --herit=0.2
genoml data-prune --geno-prefix=./exampleData/example --pheno-file=./exampleData/training.pheno --cov-file=./exampleData/training.cov --gwas-file=./exampleData/example_GWAS.txt --addit-file=./exampleData/training.addit --herit=0.5
genoml model-train --prune-prefix=./tmp/20181225-230052 --pheno-file=./exampleData/training.pheno
genoml model-tune --prune-prefix=./tmp/20181225-230052 --pheno-file=./exampleData/training.pheno
genoml model-validate --prune-prefix=./tmp/20181225-230052 --pheno-file=./exampleData/training.pheno --valid-geno-prefix=./exampleData/validation --valid-pheno-file=./exampleData/validation.pheno
Help:
For help using this tool, please open an issue on the Github repository:
https://github.com/GenoML/genoml-core/issues
报告问题
请在GenoML-core issues page上报告任何问题或建议。