将异常值与罕见变化相关联
ore的Python项目详细描述
这里提供了对ore(离群rv富集)的粗略使用,有关更多详细信息,请访问latest ORE documentation。确认已安装以下各项:
然后,在命令行上,使用
pip install ore
示例运行
ore --vcf test.vcf.gz \ --bed test.bed.gz \ --output ore_results \ --distribution normal \ --threshold 234\ --max_outliers_per_id 500\ --af_rare 0.05 0.01 1e-3 \ --tss_dist 5000
变量和基因表达分别用--vcf
(第1行)和--bed
(第2行)指定。输出前缀由--output
(第3行)提供。在本例中,异常值规范--distribution
(第4行)、--threshold
(第5行)和--max_outliers_per_id
(第6行)表明,使用z分数大于2的正态分布定义异常值,并排除异常值超过500的样本。变异信息用--af_rare
(第7行)和--tss_dist
(第8行)指定,以编码变异被定义为具有不同阈值(小于0.05、0.01和0.001)的队列内等位基因频率的罕见变异,并且仅使用tss的5 kb内的变异。
用法,有关详细信息,请访问latest ORE documentation
ore [-h] [--version] -v VCF -b BED [-o OUTPUT] [--outlier_output OUTLIER_OUTPUT] [--enrich_file ENRICH_FILE] [--extrema] [--distribution {normal,rank,custom}] [--threshold [THRESHOLD [THRESHOLD ...]]] [--max_outliers_per_id MAX_OUTLIERS_PER_ID] [--af_rare [AF_RARE [AF_RARE ...]]] [--af_vcf] [--intracohort_rare_ac INTRACOHORT_RARE_AC] [--gq GQ] [--dp DP] [--aar AAR AAR] [--tss_dist [TSS_DIST [TSS_DIST ...]]] [--upstream] [--downstream] [--annovar] [--variant_class {intronic,intergenic,exonic,UTR5,UTR3,splicing,upstream,ncRNA,ncRNA_exonic}] [--exon_class {nonsynonymous,intergenic,nonframeshift,frameshift,stopgain,stoploss}] [--refgene] [--ensgene] [--annovar_dir ANNOVAR_DIR] [--humandb_dir HUMANDB_DIR] [--processes PROCESSES] [--clean_run]
- 必需参数:
-v VCF, --vcf VCF Location of VCF file. Must be tabixed! -b BED, --bed BED Gene expression file location. Must be tabixed! - 可选文件位置:
-o OUTPUT, --output OUTPUT Output prefix (default is VCF prefix) --outlier_output OUTLIER_OUTPUT Outlier filename (default is VCF prefix) --enrich_file ENRICH_FILE Output file for enrichment odds ratios and p-values (default is VCF prefix) - 可选异常值参数:
--extrema Only the most extreme value is an outlier --distribution DISTRIBUTION Outlier distribution. Options: {normal,rank,custom} --threshold THRESHOLD Expression threshold for defining outliers. Must be greater than 0 for normal or (0,0.5) non-inclusive with rank. Ignored with custom --max_outliers_per_id MAX_OUTLIERS_PER_ID Maximum number of outliers per ID - 可选变量相关参数:
--af_rare AF_RARE AF cut-off below which a variant is considered rare (space separated list e.g., 0.1 0.05) --af_vcf Use the VCF AF field to define an allele as rare. --intracohort_rare_ac INTRACOHORT_RARE_AC Allele COUNT to be used instead of intra-cohort allele frequency. (still uses af_rare for population level AF cut-off) --af_min AF_MIN Lower bound on AF cut-offs for –af_rare, must be same length as –af_rare (e.g., with –af_rare 0.01 0.5 and –af_min 0 0.05 ORE will compare variants within [0,0.01] and [0.05,0.5] to other variants). --gq GQ Minimum genotype quality each variant in each individual --dp DP Minimum depth per variant in each individual --aar AAR Alternate allelic ratio for heterozygous variants (provide two space-separated numbers between 0 and 1, e.g., 0.2 0.8) --tss_dist TSS_DIST Variants within this distance of the TSS are considered --upstream Only variants UPstream of TSS --downstream Only variants DOWNstream of TSS - 使用annovar的可选参数:
--annovar Use ANNOVAR to specify allele frequencies and functional class --variant_class Only variants in these classes will be considered. Options: {intronic,intergenic,exonic,UTR5,UTR3,splicing,upstream,ncRNA} --exon_class Only variants with these exonic impacts will be considered. Options: {nonsynonymous,intergenic,nonframeshift,frameshift,stopgain,stoploss} --refgene Filter on RefGene function. --ensgene Filter on ENSEMBL function. --annovar_dir ANNOVAR_DIR Directory of the table_annovar.pl script --humandb_dir HUMANDB_DIR Directory of ANNOVAR data (refGene, ensGene, and gnomad_genome) - 可选参数:
-h, --help show this help message and exit --version show program’s version number and exit --processes PROCESSES Number of CPU processes --clean_run Delete temporary files from the previous run
费利克斯·里希特felix.richter@icahn.mssm.edu>;