按群集拆分bam，然后使用checkpoin按群集合并bam

SAMPLE_cluster = { "SampleA" : [ "1", "2", "3" ], "SampleB" : [ "1" ], "SampleC" : [ "1", "2" ] } CLUSTERS = [] for sample in SAMPLE_cluster: CLUSTERS.extend(SAMPLE_cluster[sample]) CLUSTERS = sorted(set(CLUSTERS) rule all: input: expand("01merged_bam/{cluster_id}.bam, cluster_id = CLUSTERS) checkpoint split_bam: input: "{sample}.bam" output: directory("01split_bam/{sample}/") shell: """ split_bam.sh {input} """ ## the split_bam.sh will split the bam file to "01split_bam/{sample}/{sample}_{cluster_id}.bam" def merge_bam_input(wildcards): checkpoint_output = checkpoints.split_bam.get(**wildcards).output[0] return expand("01split_bam/{sample}/{sample}_{{cluster_id}}.bam", \ sample = glob_wildcards(os.path.join(checkpoint_output, "{sample}_{cluster_id}.bam")).sample) rule merge_bam_per_cluster: input: merge_bam_input output: "01merged_bam/{cluster_id}.bam" log: "00log/{cluster_id}.merge_bam.log" threads: 2 shell: """ samtools merge -@ 2 -r {output} {input} """

1条回答

网友

1楼 · 发布于 2024-06-17 09:16:08

我决定不使用checkpoint，而是使用input函数来获取


SAMPLE_cluster = { "SampleA" : [ "1", "2", "3" ], "SampleB" :  [ "1" ], "SampleC" : [ "1", "2" ] }

# reverse the mapping
cluster_sample = {'1':['sampleA','sample'B','sampleC'], '2':['sampleA', 'sampleC'], '3':['sampleA']}

rule split_bam:
    input: "{sample}.bam"
    output: "split.touch"
    shell:
       """
       split_bam {input} 
       touch split.touch
       """
rule index_split_bam:
    input: "split.touch"
    output: "split_bam/{sample}_{cluster_id}.bam.bai"
    shell:
        """
        samtools index 01split_bam/{wildcards.sample}/{wildcards.sample}_{wildcards.cluster_id}.bam
        """

def get_merge_bam_input(wildcards):
    samples = cluster_sample[wildcards.cluster_id]
    return expand("01split_bam/{sample}/{sample}_{{cluster_id}}.bam.bai", sample = samples)


rule merge_bam_per_cluster:
    input: get_merge_bam_input
    output: "01merged_bam/{cluster_id}.bam"
    params:
            bam = lambda wildcards, input: " ".join(input).replace(".bai", "")
    log: "00log/{cluster_id}.merge_bam.log"
    threads: 2
    shell:
        """
        samtools merge -@ 2 -r {output} {params.bam}
        """

它似乎起作用了。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章