rm-seq是一种生物信息学工具,用于评估pe短读的抗性突变。
rmseq的Python项目详细描述
用于高通量评估的生物信息管道分析 抗药性突变。RM seq是一种基于扩增子的深度测序 单分子条码技术。我们采用了这种方法 鉴定和鉴定抗生素耐药突变。
提供了RM seq工作流的完整描述 here
这是适合我的工具吗?
- 为了能让我们看到这条管道,你需要有序列放大器 分子条码库。
- 它只支持成对的结束fastq读取(包括压缩的.gz FastQ文件)。
- 它需要重叠的成对读取。
- 它需要测序基因(DNA)的参考Fasta序列 顺序)。
- 它是用Python3和Perl编写的
安装
安装RM seq管道
pip3 install rmseq
依赖性
RM seq具有以下包依赖关系:*EMBOSS>;=6.6 clustalo,cons,getorf,diffseq*欧米茄俱乐部= 1.2.1*bwa>;=0.7.15*samtools>;=1.3*bedtools>;=2.26.0*梨 >;=0.9.10*cd hit>;=4.7*trimmomatic>;=0.36*seqtk>;=1.3-r106 (仅当子示例读取时)*python模块:plumbum, Biopython
brew tap homebrew/science brew tap tseemann/bioinformatics-linux brew install parallel; parallel --citation # please write will cite brew install bedtools brew install EMBOSS brew install clustal-omega brew install bwa brew install samtools brew install pear brew install cd-hit brew install trimmomatic brew install seqtk pip3 install plumbum pip3 install biopython
快速启动
做
rmseq
帮助
usage: rmseq [-h] ... Run RM-seq pipeline. optional arguments: -h, --help show this help message and exit Commands: run Run the pipeline. version Print version. check Check pipeline dependencies test Run the test data set.
检查是否安装了依赖项
rmseq check
运行测试数据集
rmseq test
要运行分析管道,请执行
中的步骤rmseq run -h usage: rmseq run [options] Run the pipeline positional arguments: R1 Path to read pair 1 R2 Path to read pair 2 refnuc Reference sequence that will be used for premapping filtering and mutation annotation (fasta). outdir Output directory. optional arguments: -h, --help show this help message and exit -d, --debug_on Switch on debug mode. -f, --force Force overwite of existing. -b BARLEN, --barlen BARLEN Length of barcode (default 16) -m MINFREQ, --minfreq MINFREQ Minimum barcode frequency to keep (default 5) -q BASEQUAL, --basequal BASEQUAL Minimum base quality threshold used for trimming the end of reads (trimmomatic TRAILING argument) (default 30) -c CPUS, --cpus CPUS Number of CPUs to use (default 72) -t TRANSLATION, --translation TRANSLATION Manually set the reading frame for translation (use 1, 2 or 3 - use getorf by default) -r MINSIZE, --minsize MINSIZE Minimum ORF size in bp used when annotating variants (default 200) -w WSIZE, --wsize WSIZE Word-size option to pass to diffseq for comparison with reference sequence (default 5) -s SUBSAMPLE, --subsample SUBSAMPLE Only examine this many reads. -k, --keepfiles Keep the intermediate files (default remove) -n, --noaln Skip reads alignment when generating consensus (to use for indel quantification only) (default align)
检查版本
rmseq version
输出
rm seq生成一个名为amplicons.effect的抽头分离输出文件 其中每一个raw对应一个一致的扩增子(在 按顺序排列的总体):
Column | Example | Description |
---|---|---|
barcode | GACACAACTGAGATTA | sequence of the barcode |
sample | Rifampicin1 | output folder name |
prot_mutation | H481N | annotation of the amino acid change (Histidine residue 481 substituted by Asparagine) |
prot_start | 481 | start coordinate of the mutation |
prot_end | 481 | end coordinate of the mutation |
nuc_mutation | C1443G | annotation of the nucleotide change |
nuc_start | 1443 | start coordinate of the nucleotide change |
nuc_end | 1443 | end coordinate of the nucleotide change |
prot | VRPPDKNNRFVGLYCTLV… | protein sequence of the consensus sequence |
dna | GGTTAGACCACCCGATAA… | dna sequence of the consensus sequence |
reference_barcode | CTGACACGTCCTGAAG | barcode of the identical consesnsus amplicon used for annotation |
RM seq生成的其他文件包括:
File name | Description |
---|---|
amplicons.barcodes | Table with the count of each barcode sequence |
amplicons.fna | Multifasta file containing all the consensus nucleotide sequence (header of sequence is the barcode) |
amplicons.faa | Multifasta file containing all the consensus protein sequence (header of sequence is the barcode) |
amplicons.fna.cdhit | Multifasta file containing all the unique consensus nucleotide sequence (header of sequence is the barcode) |
amplicons.faa.cdhit | Multifasta file containing all the unique consensus amino acid sequence (header of sequence is the barcode) |
问题
请向Issues Page报告问题。