在“snake”的所有规则中缺少输入文件

2024-04-24 06:36:52 发布

您现在位置:Python中文网/ 问答频道 /正文

我正试图构建一条用于生物合成基因簇检测的蛇形管道,但我正在与错误作斗争:

Missing input files for rule all:
antismash-output/Unmap_09/Unmap_09.txt
antismash-output/Unmap_12/Unmap_12.txt
antismash-output/Unmap_18/Unmap_18.txt

等等,还有更多的文件。据我所知,snakefile中的文件生成应该可以工作:

    workdir: config["path_to_files"]
wildcard_constraints:
    separator = config["separator"],
    extension = config["file_extension"],
    sample = config["samples"]

rule all:
    input:
        expand("antismash-output/{sample}/{sample}.txt", sample = config["samples"])

# merging the paired end reads (either fasta or fastq) as prodigal only takes single end reads
rule pear:
    input:
        forward = "{sample}{separator}1.{extension}",
        reverse = "{sample}{separator}2.{extension}"

    output:
        "merged_reads/{sample}.{extension}"

    conda:
        "~/miniconda3/envs/antismash"

    shell:
        "pear -f {input.forward} -r {input.reverse} -o {output} -t 21"

# If single end then move them to merged_reads directory
rule move:
    input:
        "{sample}.{extension}"

    output:
        "merged_reads/{sample}.{extension}"

    shell:
        "cp {path}/{sample}.{extension} {path}/merged_reads/"

# Setting the rule order on the 2 above rules which should be treated equally and only one run.
ruleorder: pear > move
# annotating the metagenome with prodigal#. Can be done inside antiSMASH but prefer to do it out
rule prodigal:
    input:
        "merged_reads/{sample}.{extension}"

    output:
        gbk_files = "annotated_reads/{sample}.gbk",
        protein_files = "protein_reads/{sample}.faa"

    conda:
        "~/miniconda3/envs/antismash"

    shell:
        "prodigal -i {input} -o {output.gbk_files} -a {output.protein_files} -p meta"

# running antiSMASH on the annotated metagenome
rule antiSMASH:
    input:
        "annotated_reads/{sample}.gbk"

    output:
        touch("antismash-output/{sample}/{sample}.txt")

    conda:
        "~/miniconda3/envs/antismash"

    shell:
        "antismash --knownclusterblast --subclusterblast --full-hmmer --smcog --outputfolder antismash-output/{wildcards.sample}/ {input}"

这是一个例子,我的配置yaml文件看起来像:

file_extension: fastq
path_to_files: /home/lamma/ABR/Each_reads
samples:
- Unmap_14
- Unmap_55
- Unmap_37
separator: _

我看不出我在蛇形文件中哪里出错而产生这样的错误。很抱歉问这个简单的问题,我是个新手。你知道吗


Tags: 文件thesampletxtconfiginputoutputextension
2条回答

问题是全局通配符约束设置错误:

wildcard_constraints:
    separator = config["separator"],
    extension = config["file_extension"],
    sample = '|'.join(config["samples"])  # <  this should fix the problem

紧接着extensionseperator通配符又出现了另一个问题。Snakemake只能从其他文件名中推断出这些文件名应该是什么,实际上不能通过通配符约束来设置这些文件名。我们可以使用f-string语法来填充值应该是什么:

rule pear:
    input:
        forward = f"{{sample}}{config['separator']}1.{{extension}}",
        reverse = f"{{sample}}{config['separator']}2.{{extension}}"
    ...

以及:

rule prodigal:
    input:
        f"merged_reads/{{sample}}.{config['file_extension']}"
    ...

如果通配符约束使您感到困惑,请查看snakemakeregex;如果您对f""语法感到困惑,以及何时使用单{和何时使用双{{对其进行转义,请查找有关f字符串的博客。你知道吗

希望有帮助!你知道吗

(因为我还不能评论…) 您的相对路径可能有问题,我们看不到文件的实际位置。你知道吗

调试的一种方法是使用config["path_to_files"]input:中创建绝对路径 这将为您提供更好的关于Snakemake期望文件的位置的错误消息—输入/输出文件相对于工作目录。你知道吗

相关问题 更多 >