一组脚本,用于将多个breseq分析转换在一起并突出显示感兴趣的变量。
isolateparser的Python项目详细描述
隔离分析器
使用
python isolateset_parser.py [-h] [-i FOLDER] [--no-fasta] [-w WHITELIST]
[-b BLACKLIST] [-m SAMPLE_MAP] [--filter-1000bp]
optional arguments:
-h, --help show this help message and exit
-i FOLDER, --input FOLDER
The breseq folder to parse.
--no-fasta Whether to generate an aligned fasta file of all snps
in the breseq VCF file.
-w WHITELIST, --whitelist WHITELIST
Samples not in the whitelist are ignored. Either a
comma-separated list of sample ids for a file with
each sample id occupying a single line.
-b BLACKLIST, --blacklist BLACKLIST
Samples to ignore. See `--whitelist` for possible
input formats.
-m SAMPLE_MAP, --sample-map SAMPLE_MAP
A file mapping sample ids to sample names. Use if the
subfolders in the breseqset folder are named
differently from the sample names. The file should
have two columns: `sampleId` and `sampleName`,
separated by a tab character.
--filter-1000bp Whether to filter out variants that occur within
1000bp of each other. Usually indicates a mapping
error.
输入
脚本需要一个单独运行breseq的文件夹,每个文件夹都以isolate/sample命名。
scits只需要每个文件夹中的output.vcf
、annotated.gd
和{
输出
这些脚本在breseq run文件夹中生成一个excel文件,包含4个工作表:comparison
,variant
,coverage
,和{variant
、coverage
和junction
表只是breseq运行中所有示例的串联表。在
比较表
一种表,其中每一行都表示在示例调用集中看到的单个突变 样本由列表示,每一个样本都有交替的序列。在
Sample1 | Sample2 | Sample3 | annotation | description | gene | locusTag | mutationCategory | position | presentIn | presentInAllSamples | ref | seq id |
---|---|---|---|---|---|---|---|---|---|---|---|---|
GG | GG | GG | intergenic (+65/+20) | putative lipoprotein/putative hydrolase | PFLU0045 - / - PFLU0046 | PFLU0045/PFLU0046 | small_indel | 45881 | 3 | 1 | G | NC_012660 |
CC | CC | CC | intergenic (+17/-136) | microcin-processing peptidase 1. Unknown type peptidase. MEROPS family U62/hypothetical protein | PFLU0872 - / - PFLU0873 | PFLU0872/PFLU0873 | small_indel | 985333 | 3 | 1 | C | NC_012660 |
intergenic (+57/+21) | hypothetical protein/putative helicase | PFLU3154 - / - PFLU3155 | PFLU3154/PFLU3155 | small_indel | 3447986 | 3 | 1 | NC_012660 | ||||
A | A | G | M350I (ATG-ATA) | putative GGDEF domain signaling protein | PFLU3571 - | PFLU3571 | snp_nonsynonymous | 3959631 | 2 | 0 | G | NC_012660 |
A | A | C | T238P (ACC-CCC) | hybrid sensory histidine kinase in two-component regulatory system with UvrY | PFLU3777 - | PFLU3777 | snp_nonsynonymous | 4173231 | 1 | 0 | A | NC_012660 |
G | G | GG | coding (322/1476 nt) | putative two-component system response regulator nitrogen regulation protein NR(I) | PFLU4443 - | PFLU4443 | small_indel | 4908233 | 1 | 0 | G | NC_012660 |
对齐的fasta文件
这些脚本还生成3个fasta文件(breseq.snp.fasta
,breseq.amino.fasta
,breseq.codon.fasta
)
每个样本中的所有非同义snp都由替换碱基、氨基酸和密码子表示。
示例:
>reference
GA
>Sample1
AA
>Sample2
AA
>Sample3
GC
- 项目
标签: