使用snakemake pairended bwa alignmen

import os import snakemake.io import glob (SAMPLES,READS,) = glob_wildcards("raw/{sample}_{read}.fastq.gz") READS=["1","2"] REF="/data/data/reference/refs/ucsc.hg19.fasta.gz" rule all: input: expand("raw/{sample}.bam",sample=SAMPLES) rule bwa_map: input: ref=REF, r1=expand("raw/{sample}_{read}.fastq.gz",sample=SAMPLES,read=READS), r2=expand("raw/{sample}_{read}.fastq.gz",sample=SAMPLES,read=READS) output: "raw/{sample}.bam" shell: "bwa mem -M -t 8 {input.ref} {input.r1} {input.r2} | samtools view -Sbh - > {output}"

1条回答

网友

1楼 · 发布于 2024-06-17 08:25:45

第一个规则（这里是rule all）指定要在snakemake工作流期间创建的文件。在

对于给定的文件f，在rule all::input中，snakemake将查看所有规则并尝试找到一个可以创建f（基于每个规则的output段上的模式匹配）。在

假设f = raw/my_sample.bam

一旦snakemake找到可以创建f的规则，它将确定生成该文件所需的所有输入文件。在

所以在这里，snakemake发现f = raw/my_sample.bam可以由rule bwa_map创建（因为f与模式raw/<anything>.bam匹配），然后根据input段确定生成{}所需的文件。在

Snakemake认为：我可以做raw/my_sample.bam，如果我有文件ref="/data/data/reference/refs/ucsc.hg19.fasta.gz" 文件r1=expand("raw/{sample}_{read}.fastq.gz",sample=SAMPLES,read=READS) 文件r2=expand("raw/{sample}_{read}.fastq.gz",sample=SAMPLES,read=READS)

在expand中，r1将sample扩展到示例中的每个值，read扩展到读取中的每个值。但是在SAMPLES中有10个值，READS中有2个值，因此r1将为它试图生成的每个输出文件扩展到20个不同的文件路径。它忽略了sample子句中的sample通配符（因为您在expand调用中重写了它）。在

必须让output子句中定义的通配符冒泡到input子句：

import os
import snakemake.io
import glob

(SAMPLES,READS,) = glob_wildcards("raw/{sample}_{read}.fastq.gz")
READS=["1","2"]
REF="/data/data/reference/refs/ucsc.hg19.fasta.gz"

rule all:
    input: expand("raw/{sample}.bam",sample=SAMPLES)

rule bwa_map:
    input:
        ref=REF,
        # determine `r1` based on the {sample} wildcard defined in `output`
        # and the fixed value `1` to indicate the read direction
        r1="raw/{sample}_1.fastq.gz",
        # determine `r2` based on the {sample} wildcard similarly
        r2="raw/{sample}_2.fastq.gz"

    output: "raw/{sample}.bam"

    # better to pass in the threads than to hardcode them in the shell command
    threads: 8

    shell: "bwa mem -M -t {threads} {input.ref} {input.r1} {input.r2} | samtools view -Sbh - > {output}"

我建议您看看bwa对齐规则是如何在snakemake wrappers资源中编写的（您也可以考虑使用包装器）：https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/bwa/mem.html

主题外：从代码评审的角度来看，我质疑您为什么要将对齐的数据输出到您的raw目录？将对齐的数据输出到align或aligned会更有意义吗？您似乎还从不使用的包中导入。在

相关问题更多 >

编程相关推荐

热门问题

热门文章