如何在Snakemake工作流中设置shell输出目录

-1 投票
1 回答
34 浏览
提问于 2025-04-14 18:36

在我的命令行指令中,--output_dir 这个选项是用来指定文件保存到哪个文件夹的。

但是我总是遇到错误。

SyntaxError:
Not all output, log and benchmark files of rule bismark_cov contain the same wildcards. This is crucial though, in order to avoid that two or more jobs write to the same file.
  File "extra_bismark_methyl_analysis.smk", line 35, in <module>
bismark_methylation_extractor {input.bam_path} --parallel 4 \
--paired-end --comprehensive \
--bedGraph --zero_based --output_dir {params.out_dir}

请看一下我使用的完整命令。

import os
import glob
from datetime import datetime

#import configs
configfile: "/lila/data/greenbaum/users/ahunos/apps/lab_manifesto/configs/config_snakemake_lilac.yaml"

# Define the preprocessed files directory  
preprocessedDir = '/lila/data/greenbaum/projects/methylSeq_Spectrum/data/preprocessed/WholeGenome_Methyl/OUTDIR/bismark/deduplicated/*.bam'
dir2='/lila/data/greenbaum/projects/methylSeq_Spectrum/data/preprocessed/Capture_Methyl/OUTDIR/bismark/deduplicated/*.bam'

# Create the pattern to match BAM files
def get_bams(nfcore_OUTDIR):
    bam_paths = glob.glob(nfcore_OUTDIR, recursive=True)
    return bam_paths

#combine bam files
bam_paths = get_bams(nfcore_OUTDIR=preprocessedDir) + get_bams(nfcore_OUTDIR=dir2)
print(bam_paths)

#get sample names
SAMPLES = [os.path.splitext(os.path.basename(f))[0] for f in bam_paths]
print(f"heres SAMPLES \n{SAMPLES}")


contexts=['CpG','CHH','CHG']
suffixes=['bismark.cov.gz','M-bias.txt', '.bedGraph.gz']

rule all:
    input:
        expand('results/{sample}/{sample}.{suffix}', sample=SAMPLES, suffix=suffixes, allow_missing=True),
        expand('results/{sample}/{sample}_splitting_report.txt', sample=SAMPLES,allow_missing=True),
        expand('results/{sample}/{C_context}_context_{sample}.txt', sample=SAMPLES, C_context=contexts,allow_missing=True)

rule bismark_cov:
    input:
        bam_path=lambda wildcards: wildcards.bam_paths
    output:
        'results/{sample}/{sample}.{suffix}',
        'results/{sample}/{sample}_splitting_report.txt',
        'results/{sample}/{C_context}_context_{sample}.txt'
    params:
        out_dir='results/{sample}'
    shell:
        """ 
bismark_methylation_extractor {input.bam_path} --parallel 4 \
--paired-end --comprehensive \
--bedGraph --zero_based --output_dir {params.out_dir}
        """

1 个回答

0

问题在于你有一个输出,其中包含了额外的通配符({suffix} 和 {C_context})。根据第一行输出,它会对每个样本运行这个规则三次。但中间的输出行每个样本只会运行一次({C_context} 也是如此)。

我对 bismark_methylation_extractor 不太了解,但我猜它会生成这些后缀和上下文。如果是这样的话,你可以选择明确地写出这些内容

 output:
    'results/{sample}/{sample}.M-bias.txt',
    'results/{sample}/{sample}.bismark.cov.gz',
    'results/{sample}/{sample}.bedGraph.gz',
    'results/{sample}/{sample}_splitting_report.txt',
    

或者我想结果部分的 expand 命令(就像你在规则的输入中做的那样)也应该可以工作。

撰写回答