如何在Snakemake工作流中设置shell输出目录
在我的命令行指令中,--output_dir
这个选项是用来指定文件保存到哪个文件夹的。
但是我总是遇到错误。
SyntaxError:
Not all output, log and benchmark files of rule bismark_cov contain the same wildcards. This is crucial though, in order to avoid that two or more jobs write to the same file.
File "extra_bismark_methyl_analysis.smk", line 35, in <module>
bismark_methylation_extractor {input.bam_path} --parallel 4 \
--paired-end --comprehensive \
--bedGraph --zero_based --output_dir {params.out_dir}
请看一下我使用的完整命令。
import os
import glob
from datetime import datetime
#import configs
configfile: "/lila/data/greenbaum/users/ahunos/apps/lab_manifesto/configs/config_snakemake_lilac.yaml"
# Define the preprocessed files directory
preprocessedDir = '/lila/data/greenbaum/projects/methylSeq_Spectrum/data/preprocessed/WholeGenome_Methyl/OUTDIR/bismark/deduplicated/*.bam'
dir2='/lila/data/greenbaum/projects/methylSeq_Spectrum/data/preprocessed/Capture_Methyl/OUTDIR/bismark/deduplicated/*.bam'
# Create the pattern to match BAM files
def get_bams(nfcore_OUTDIR):
bam_paths = glob.glob(nfcore_OUTDIR, recursive=True)
return bam_paths
#combine bam files
bam_paths = get_bams(nfcore_OUTDIR=preprocessedDir) + get_bams(nfcore_OUTDIR=dir2)
print(bam_paths)
#get sample names
SAMPLES = [os.path.splitext(os.path.basename(f))[0] for f in bam_paths]
print(f"heres SAMPLES \n{SAMPLES}")
contexts=['CpG','CHH','CHG']
suffixes=['bismark.cov.gz','M-bias.txt', '.bedGraph.gz']
rule all:
input:
expand('results/{sample}/{sample}.{suffix}', sample=SAMPLES, suffix=suffixes, allow_missing=True),
expand('results/{sample}/{sample}_splitting_report.txt', sample=SAMPLES,allow_missing=True),
expand('results/{sample}/{C_context}_context_{sample}.txt', sample=SAMPLES, C_context=contexts,allow_missing=True)
rule bismark_cov:
input:
bam_path=lambda wildcards: wildcards.bam_paths
output:
'results/{sample}/{sample}.{suffix}',
'results/{sample}/{sample}_splitting_report.txt',
'results/{sample}/{C_context}_context_{sample}.txt'
params:
out_dir='results/{sample}'
shell:
"""
bismark_methylation_extractor {input.bam_path} --parallel 4 \
--paired-end --comprehensive \
--bedGraph --zero_based --output_dir {params.out_dir}
"""
1 个回答
0
问题在于你有一个输出,其中包含了额外的通配符({suffix} 和 {C_context})。根据第一行输出,它会对每个样本运行这个规则三次。但中间的输出行每个样本只会运行一次({C_context} 也是如此)。
我对 bismark_methylation_extractor 不太了解,但我猜它会生成这些后缀和上下文。如果是这样的话,你可以选择明确地写出这些内容
output:
'results/{sample}/{sample}.M-bias.txt',
'results/{sample}/{sample}.bismark.cov.gz',
'results/{sample}/{sample}.bedGraph.gz',
'results/{sample}/{sample}_splitting_report.txt',
或者我想结果部分的 expand 命令(就像你在规则的输入中做的那样)也应该可以工作。