我对snakemake非常陌生,而且对python也不是很流利(所以很抱歉,这可能是一个非常基本的愚蠢问题):
我当前正在构建一个管道,用atlas分析一组bam文件。这些BAM文件位于不同的文件夹中,不应移动到公用文件夹中。因此,我决定提供一个如下所示的示例列表(这只是一个示例,实际上示例可能位于完全不同的驱动器上):
Sample Path
Sample1 /some/path/to/my/sample/
Sample2 /some/different/path/
把它装进我的配置.yaml有:
^{pr2}$现在到我的蛇档案:
import pandas as pd
#define configfile with paths etc.
configfile: "config.yaml"
#read-in dataframe and define Sample and Path
SAMPLES = pd.read_table(config["sample_file"])
BAMFILE = SAMPLES["Sample"]
PATH = SAMPLES["Path"]
rule all:
input:
expand("{path}{sample}.summary.txt", zip, path=PATH, sample=BAMFILE)
#this works like a charm as long as I give the zip-function in the rules 'all' and 'summary':
rule indexBam:
input:
"{path}{sample}.bam"
output:
"{path}{sample}.bam.bai"
shell:
"samtools index {input}"
#this following command works as long as I give the specific folder for a sample instead of {path}.
rule bamdiagnostics:
input:
bam="{path}{sample}.bam",
bai=expand("{path}{sample}.bam.bai", zip, path=PATH, sample=BAMFILE)
params:
prefix="analysis/BAMDiagnostics/{sample}"
output:
"analysis/BAMDiagnostics/{sample}_approximateDepth.txt",
"analysis/BAMDiagnostics/{sample}_fragmentStats.txt",
"analysis/BAMDiagnostics/{sample}_MQ.txt",
"analysis/BAMDiagnostics/{sample}_readLength.txt",
"analysis/BAMDiagnostics/{sample}_BamDiagnostics.log"
message:
"running BamDiagnostics...{wildcards.sample}"
shell:
"{config[atlas]} task=BAMDiagnostics bam={input.bam} out={params.prefix} logFile={params.prefix}_BamDiagnostics.log verbose"
rule summary:
input:
index=expand("{path}{sample}.bam.bai", zip, path=PATH, sample=BAMFILE),
bamd=expand("analysis/BAMDiagnostics/{sample}_approximateDepth.txt", sample=BAMFILE)
output:
"{path}{sample}.summary.txt"
shell:
"echo -e '{input.index} {input.bamd}"
我知道错误了
WildcardError in line 28 of path/to/my/Snakefile: Wildcards in input files cannot be determined from output files: 'path'
有人能帮我吗?
-我试图用join
来解决这个问题,或者创建输入函数,但是我认为我没有足够的技能来发现我的错误。。。
-我想问题是,我的摘要规则不包含bamdiagnostics输出的带有{path}
的元组(因为输出在其他地方),并且无法连接到输入文件,或者说。。。
-扩展我对bamcodiagnostics规则的输入可以使代码正常工作,但当然,将每个样本输入转换为每个样本输出,会造成一个大混乱:
In this case, both bamfiles are used for the creation of each outputfile. This is wrong as the samples AND the output are to be treated independently.
根据atlas文档,您似乎需要为每个样本分别运行每个规则,这里的复杂之处在于每个样本都在不同的路径中。在
我修改了您的脚本以适用于上述情况(请参见DAG)。脚本开头的变量被修改以使其更有意义。})。
config
被删除用于演示目的,并且使用了pathlib
库(而不是{pathlib
不是必要的,但它帮助我保持理智。修改了shell命令以避免config
。在相关问题 更多 >
编程相关推荐