该模块有助于使用人的ngs数据集调用结构变量。

iCallSV的Python项目详细描述


icallsv:从ngs数据集检测结构畸变

Author:Ronak H Shah
Contact:rons.shah@gmail.com
Source code:http://github.com/rhshah/iCallSV
Wiki:http://icallsv.readthedocs.io/en/latest/
License:Apache License 2.0
Code Healthhttps://zenodo.org/badge/DOI/10.5281/zenodo.184864.svghttps://codecov.io/gh/rhshah/iCallSV/branch/master/graph/badge.svg

icallsv是一个python库和命令行软件工具包,用于从下一代dna测序数据中调用结构畸变。在幕后,它使用delly2进行结构变量调用。它设计用于混合捕获,包括整个外显子体和自定义目标面板,以及 短读测序平台,如Illumina。

这里可以观察过滤过程:Workflow

引文

我们正在出版一份手稿,将icallsv描述为结构变异检测框架的一部分。 如果您在出版物中使用此软件,请引用我们的网站iCallSV

由于某些原因,Read The Docs不显示docstrings

因此,请使用这些来自GithubHtml Preview的url获取每个模块的信息:

Per Module Info

必需的软件包

我们要求您安装:

pandas:v0.16.2
biopython:v1.65
pysam:v0.8.4
pyvcf:0.6.7
Delly:v0.7.5
targetSeqView:master
iAnnotateSV:v1.0.6
coloredlogs:v5.2

所需数据文件

这些文件位于icallsv中的data文件夹中。

配置文件格式

#~~~Template configuration file to run iCallSV~~~##### Path to python executable ###[Python]PYTHON:#### Path to R executable and R Lib ###[R]RHOME:RLIB:#### Path to delly, bcftools executables and Version of delly (supports only 0.7.3)###[SVcaller]DELLY:DellyVersion:BCFTOOLS:#### Path to hg19 Referece Fasta file ###[ReferenceFasta]REFFASTA:#### Path to file containing regions to exclude please follow Delly documentation for this ###[ExcludeRegion]EXREGIONS:#### Path to file containing regions to where lenient threshold will be used; and file containing genes to keep ###[HotSpotRegions]HotspotFile:GenesToKeep:#### Path to file containing regions/genes to filter ###[BlackListRegions]BlackListFile:BlackListGenes:#### Path to samtools executable ###[SAMTOOLS]SAMTOOLS:#### Path to iAnnotateSV.py and all its required files, please follow iAnnotateSV documentation ###[iAnnotateSV]ANNOSV:GENOMEBUILD:DISTANCE:CANONICALTRANSCRIPTFILE:UNIPROTFILE:CosmicCensus:CosmicFusionCounts:RepeatRegionAnnotation:DGvAnnotations:#### TargetSeqView Parameters ###[TargetSeqView]CalculateConfidenceScore:GENOMEBUILD:ReadLength:#### Parameters to run Delly ###[ParametersToRunDelly]MAPQ: 20NumberOfProcessors: 4[ParametersToFilterDellyResults]####Case Allele Fraction Hotspot####CaseAltFreqHotspot: 0.05####Total Case Coverage Hotspot#####CaseCoverageHotspot=5####Control Allele Fraction Hotspot####ControlAltFreqHotspot=0####Case Allele Fraction####CaseAltFreq: 0.10####Total Case Coverage#####CaseCoverage=10####Control Allele Fraction####ControlAltFreq=0###Overall Supporting Read-pairs ###OverallSupportingReads: 5###Overall Supporting Read-pairs Hotspot ###OverallSupportingReadsHotspot: 3###Overall Supporting splitreads ###OverallSupportingSplitReads: 0###Overall Supporting splitreads Hotspot ###OverallSupportingSplitReadsHotspot: 0###Case Supporting Read-pairs ###CaseSupportingReads: 2###Case Supporting splitreads ###CaseSupportingSplitReads: 0###Case Supporting Read-pairs Hotspot ###CaseSupportingReadsHotspot: 1###Case Supporting splitreads Hotspot ###CaseSupportingSplitReadsHotspot: 0###Control Supporting Read-pairs ###ControlSupportingReads: 3###Control Supporting Read-pairs Hotspot ###ControlSupportingReadsHotspot: 3###Control Supporting splitreads ###ControlSupportingSplitReads: 3###Control Supporting splitreads Hotspot ###ControlSupportingSplitReadsHotspot: 3###Length of Structural Variant###LengthOfSV: 500###Overall Mapping Quality Threshold###OverallMapq: 20###Overall Mapping Quality Threshold Hotspot###OverallMapqHotspot: 5

快速使用

python iCallSV.py -sc /path/to/template.ini -abam /path/to/casebamFile -bbam /path/to/controlbamFile -aId caseID -bId controlId -o /path/to/output/directory -op prefix_for_the_output_files
> python iCallSV.py -h

usage: iCallSV.py [-h][-v][-V] -sc config.ini -abam caseBAMFile.bam -bbam
                  controlBAMFile.bam -aId caseID -bId controlID -o
                  /somepath/output -op TumorID

iCallSV.iCallSV -- wrapper to run iCallSV package

  Created by Ronak H Shah on 2015-03-30.
  Copyright 2015-2016 Ronak H Shah. All rights reserved.

  Licensed under the Apache License 2.0
  http://www.apache.org/licenses/LICENSE-2.0

  Distributed on an "AS IS" basis without warranties
  or conditions of any kind, either express or implied.

USAGE

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         set verbosity level [default: True]
  -V, --version         show program's version number and exit
  -sc config.ini, --svConfig config.ini
                        Full path to the structural variant configuration
  -abam caseBAMFile.bam, --caseBam caseBAMFile.bam
                        Full path to the case bam file
  -bbam controlBAMFile.bam, --controlBam controlBAMFile.bam
                        Full path to the control bam file
  -aId caseID, --caseId caseID
                        Id of the case to be analyzed, this will be the sub-
                        folder
  -bId controlID, --controlId controlID
                        Id of the control to be used, this will be used for
                        filtering variants
  -o /somepath/output, --outDir /somepath/output
                        Full Path to the output dir.
  -op TumorID, --outPrefix TumorID
                        Id of the Tumor bam file which will be used as the
                        prefix for output files

在SGE或LSF上运行

注意:

对于sge和lsf,您需要根据已协助使用omp num_threads进行delly安装的线程数提供内核总数。

注意:

例如:如果您将omp num嫒u threads设置为export omp num嫒u threads=3,则需要将总内核数设置为13(12+1额外作为缓冲区),因此对于每个delly程序,它使用3个内核。在这里,我使用pythons多处理模块来启动delly,因此所有四个程序都将作为seprate进程启动,利用给它们的线程数,但设置omp num_threads

sge

qsub -q some.q -N iCallSV_JobName -o iCallSV.stdout -e iCallSV.stderr -V -l h_vmem=6G,virtual_free=6G -pe smp 13 -wd /some/path/to/working/dir -sync y  -b y python iCallSV.py -sc template.ini -bbam control.bam -abam case.bam -aId caseID -bId controlID -op outputPrefix -o  /some/path/to/output/dir -v

LSF

bsub -q some.q -J iCallSV_JobName -o iCallSV.stdout -e iCallSV.stderr -We 24:00 -R "rusage[mem=20]" -M 30 -n 13 -cwd /some/path/to/working/dir "python iCallSV.py -sc template.ini -bbam control.bam -abam case.bam -aId caseID -bId controlID -op outputPrefix -o  /some/path/to/output/dir -v"

实用程序

在msk-impact池上运行icallsv

这仅适用于msk-impact内部示例

> python iCallSV_dmp_wrapper.py -h

usage: iCallSV_dmp_wrapper.py [options]

Run iCallSV on selected pools using MSK data

optional arguments:
  -h, --help            show this help message and exit
  -fl folders.fof, --folderList folders.fof
                        Full path folders file of files.
  -qc /some/path/qcLocation, --qcLocation /some/path/qcLocation
                        Full path qc files.
  -b /some/path/bamlocation, --bamLocation /some/path/bamlocation
                        Full path bam files.
  -P /somepath/python, --python /somepath/python
                        Full path Pyhton executables.
  -icsv /somepath/iCallSV.py, --iCallSV /somepath/iCallSV.py
                        Full path iCallSV.py executables.
  -conf /somepath/template.ini, --iCallSVconf /somepath/template.ini
                        Full path configuration file to run iCallSV
  -q all.q or clin.q, --queue all.q or clin.q
                        Name of the SGE queue
  -qsub /somepath/qsub, --qsubPath /somepath/qsub
                        Full Path to the qsub executables of SGE.
  -t 5, --threads 5     Number of Threads to be used to run iCallSV
  -v, --verbose         make lots of noise [default]
  -o /somepath/output, --outDir /somepath/output
                        Full Path to the output dir.
  -of outputfile.txt, --outDir outputfile.txt
                                            Name of the final output file.

采集样本中的icallsv和chechking进行转录/cdna处理

> python check_cDNA_contamination.py -h
usage: check_cDNA_contamination.py [options]

Calculate cDNA contamination per sample based of the Structural Variants
Pipeline result

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         make lots of noise [default]
  -s SVfile.txt, --svFile SVfile.txt
                        Location of the structural variant file to be used
  -o cDNA_contamination, --outputFileName cDNA_contamination
                        Full path name for the output file

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


BlackListFile:

(blacklist.txt) Tab-delimited file wihout header having black listed regions in order:

^{bq 1}$
Example:7 140498077 5 175998094
黑名单基因:

(blacklistgenes.txt)基因每行列出一个,不带要删除的标题

Example:

LINC00486

CNOT4

热点文件:

(hotspotgenes.txt)制表符分隔的文件,文件头按顺序没有热点区域:

chromosome, start, end, name

Example:2 29416089 30143525 ALK
genestokeep:

(genestokeinlude.txt)在每行中列出一个要保留的不带标题的基因

Example:

ALK

BRAF