parris:利用iso-seq分析和注释圆形rna
parris的Python项目详细描述
H1> PARRIS:用Iso Seq 分析和注释循环RNA
开始
安装
^ STR 1 } $ PARRIS<<强>是用^ {CD1>}编写的,请使用^ {CD2>}来安装^ {STR 1 } $ PARRIS:
pip install parris
或,您可以按照指令安装^ {STR 1 } $PARRIS从源:
git clone https://github.com/yangao07/PARRIS.git
cd PARRIS
python setup.py install # install main package
pip install -r requirements.txt # install dependencies
另外,请确保系统中安装了^{
命令示例1:
parris -t 8 long_circRNA.fa reference.fa gene_anno.gtf circRNA.bed output_folder
命令示例2:
parris -t 8 long_circRNA.fa reference.fa gene_anno.gtf circRNA.bed output_folder \ --short-read short_read.fa \ --Alu ./anno/hg19/alu.bed \ --all-repeat ./anno/hg19/all_repeat.bed
详细参数:
parris -h usage: parris [-h] [-v] [-t THREADS] [--short-read short.fa] [--lordec LORDEC] [--kmer KMER] [--solid SOLID] [--trf TRF] [--match MATCH] [--mismatch MISMATCH] [--indel INDEL] [--match-frac MATCH_FRAC] [--indel-frac INDEL_FRAC] [--min-score MIN_SCORE] [--max-period MAX_PERIOD] [--fxtools FXTOOLS] [--min-len MIN_LEN] [--min-copy MIN_COPY] [--min-frac MIN_FRAC] [--minimap MINIMAP] [-f] [--high-max-ratio HIGH_MAX_RATIO] [--high-min-ratio HIGH_MIN_RATIO] [--high-iden-ratio HIGH_IDEN_RATIO] [--high-repeat-ratio HIGH_REPEAT_RATIO] [--low-repeat-ratio LOW_REPEAT_RATIO] [--Alu ALU] [--flank-len FLANK_LEN] [--all-repeat ALL_REPEAT] [-s SITE_DIS] [-S END_DIS] long.fa ref.fa anno.gtf circRNA.bed/gtf output PARRIS: Profiling and Annotating ciRcular RNA with Iso-Seq positional arguments: long.fa Long read data generated from long-read circRNA sequencing technique. ref.fa Reference genome sequence file. anno.gtf Whole gene annotation file in GTF format. circRNA.bed/gtf circRNA annotation file in BED12 or GTF format. output Output directory for final result and temporary files. optional arguments: -h, --help show this help message and exit -v, --version show program's version number and exit General options: -t THREADS, --threads THREADS Number of thread to use. (default: 8) Hybrid error-correction with short-read data (LoRDEC): --short-read short.fa Short-read data for error correction. Use ',' to connect multiple or paired-end short read data. (default: ) --lordec LORDEC Path to lordec-correct. (default: lordec-correct) --kmer KMER k-mer size. (default: 21) --solid SOLID Solid k-mer abundance threshold. (default: 3) Detecting tandem-repeat with TRF(Tandem Repeat Finder): --trf TRF Path to trf program. (default: trf409.legacylinux64) --match MATCH Match score. (default: 2) --mismatch MISMATCH Mismatch penalty. (default: 7) --indel INDEL Indel penalty. (default: 7) --match-frac MATCH_FRAC Match probability. (default: 80) --indel-frac INDEL_FRAC Indel probability. (default: 10) --min-score MIN_SCORE Minimum alignment score to report. (default: 100) --max-period MAX_PERIOD Maximum period size to report. (default: 2000) Extracting and aligning consensus sequence to genome (minimap2): --fxtools FXTOOLS Path to fxtools. (default: fxtools) --min-len MIN_LEN Minimum consensus length to keep. (default: 30) --min-copy MIN_COPY Minimum copy number of consensus to keep. (default: 2.0) --min-frac MIN_FRAC Minimum fraction of original long read to keep. (default: 0.0) --minimap MINIMAP Path to minimap2. (default: minimap2) -f, --do-classify Classify circRNA alignment into high-quality and low- quality. (default: False) --high-max-ratio HIGH_MAX_RATIO Maximum mappedLen / consLen ratio for high-quality alignment. (default: 1.1) --high-min-ratio HIGH_MIN_RATIO Minimum mappedLen /consLen ratio for high-quality alignment. (default: 0.9) --high-iden-ratio HIGH_IDEN_RATIO Minimum identicalBases/ consLen ratio for high-quality alignment. (default: 0.75) --high-repeat-ratio HIGH_REPEAT_RATIO Maximum mappedLen / consLen ratio for high-quality self-tandem consensus. (default: 0.6) --low-repeat-ratio LOW_REPEAT_RATIO Minimum mappedLen / consLen ratio for low-quality self-tandem alignment. (default: 1.9) Evaluating circRNA with annotation: --Alu ALU Alu repetitive element annotation in BED format. (default: ) --flank-len FLANK_LEN Length of upstream and downstream flanking sequence to search for Alu. (default: 500) --all-repeat ALL_REPEAT All repetitive element annotation in BED format. (default: ) -s SITE_DIS, --site-dis SITE_DIS Allowed distance between circRNA internal-splice-site and annoated splice-site. (default: 0) -S END_DIS, --end-dis END_DIS Allowed distance between circRNA back-splice-site and annoated splice-site. (default: 10)
变更日志(v1.5.9)
- 修复搜索已知拼接接头时出现的错误。
- 使用已知和规范的内部拼接链信息来指导反向拼接连接的搜索。