将indel分类为变异类
sorting-hat的Python项目详细描述
- 将索引排序到定义如下的类:
- 均聚物运行(hr):突变发生在6个或更多的区域 插入或删除的核苷酸的拷贝
- 拷贝数变化(ccc):插入或删除的等位基因有1个或 突变区重复次数较多
- 拷贝数无变化(非ccc):插入或删除的等位基因是 突变区不重复
- 要使用分类功能,必须确保安装了以下组件:
要安装,请使用pip:
pip install sorting_hat
示例运行
sorting_hat --bed test.bed \ --fasta test.fasta \ --repeat repeat_masker.txt
用法
sorting_hat [-h] -b BED -f FASTA -r REPEAT [-o OUTPUT]
将索引排序为变异类
-b BED, --bed BED Location of BED file with all variants. Must be formatted as Chrom/Start/End/Ref/Alt/PatientID. -f FASTA, --fasta FASTA Location of reference fasta file. -r REPEAT, --repeat REPEAT Location of RepeatMasker file downloaded from UCSC Genome Browser. Refer to docs to see how to download RepeatMasker. -o OUTPUT, --output OUTPUT Name of output file, if not chosen then will print to stdout.
要从UCSC基因组浏览器下载RepeatMasker,请参阅“数据”中的照片 github上的文件夹:https://github.com/allisonseiden/sorting_hat
艾莉森·塞登ahseiden@gmail.com>;