在规则中的函数中使用通配符

2024-05-13 23:25:27 发布

您现在位置:Python中文网/ 问答频道 /正文

我想对我生产线中的每个样品做一次特殊的修整

我在下面的配置中有一个示例列表:

#######################
#      Config data    # 
#######################

samples:
  3373-1_CCGCGGTT-CTAGCGCT-AHV5HLDSXY_L004:
  3373-2_TTATAACC-TCGATATC-AHV5HLDSXY_L004:
  3373-3_GGACTTGG-CGTCTGCG-AHV5HLDSXY_L004:
  3373-4_AAGTCCAA-TACTCATA-AHV5HLDSXY_L004: 
  ....

在我的配置中还有一个列表,其中包含示例和引物,作为trim5和trim3的字典

trim5:
  3373-1_CCGCGGTT-CTAGCGCT-AHV5HLDSXY_L004: GATCGGAAGAGCACACGTCTGAACTCCAGTCACCCGCGGTTATCTCGTATGCCGTCTTCTGCTTG 
  3373-2_TTATAACC-TCGATATC-AHV5HLDSXY_L004: GATCGGAAGAGCACACGTCTGAACTCCAGTCACTTATAACCATCTCGTATGCCGTCTTCTGCTTG
  3373-3_GGACTTGG-CGTCTGCG-AHV5HLDSXY_L004: GATCGGAAGAGCACACGTCTGAACTCCAGTCACGGACTTGGATCTCGTATGCCGTCTTCTGCTTG
  3373-4_AAGTCCAA-TACTCATA-AHV5HLDSXY_L004: GATCGGAAGAGCACACGTCTGAACTCCAGTCACAAGTCCAAATCTCGTATGCCGTCTTCTGCTTG
 .....

因此,我做了一个函数,可以选择与此示例链接的引物

# get_index_trim5/3 allow to verify that we use the correct adaptater depending of the correct sequence
def get_index_trim5():
      for trim3 in config['trim5']:
             if {wildcards.sample} == trim5:
                print (config['trim5'])
                print (config['trim5'][trim5])
                    return ( config['trim5'][trim5] )
             else: 
                    continue 
         
            

# get_index_trim5/3 allow to verify that we use the correct adaptater depending of the correct sequence
def get_index_trim3():
      for trim3 in config['trim3']:      
            if {wildcards.sample} == trim3:
                print (config['trim3'])
                print (config['trim3'][trim3])
                    return ( config['trim3'][trim3] )
             else: 
                    continue
            

  

rule cutadapt_remove_adaptater_trimm:
    priority: 0
    input:
        reads=["../resources/sequences/{sample}_R1.fastq.gz", "../resources/sequences/{sample}_R2.fastq.gz"]
    output:
        fastq1= "../results/trimmed/{sample}_R1_trimmed.fastq.gz",
        fastq2= "../results/trimmed/{sample}_R2_trimmed.fastq.gz",
        qc= "../results/trimmed/{sample}_qc.txt"
    params:
        adapters="-g %s -a %s -a %s -a %s  -G %s -A %s -A %s -A %s -A %s  "%(get_index_trim5(), config['adaptaters']['adap_R1'], config['adaptaters']['PolyAG'],get_index_trim3(), get_index_trim5(), config['adaptaters']['adap_R2'], get_index_trim3(), config['adaptaters']['PolyG']), 
        extra="--minimum-length 100 -q 20"
    log:
        "../results/logs/trimmed/{sample}_trimmed.log"
    benchmark : 
        "../results/benchmarks/trimmed/{sample}_trimmed_benchmark.txt"
    message:
        """
        --- Trimming on {wildcards.sample} {params.adapters} in process ---
        """
    threads: 4
    resources:
        mem_mb=25000
    shell:
      "cutadapt {params.adapters} {params.extra} -o {output.fastq1} -p {output.fastq2} -j {threads} {input.reads} > {output.qc} 2> {log}"

问题是我需要在我的函数中检查通配符是否与primer对应,但我无法将wildcards.sample作为参数传递给我的函数。 我使用通配符有些困难

你能帮我吗

提前谢谢,祝你今天愉快, 阿卡

据强强刮刮刮刮刮刮刮方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方###################

            EDIT Solution

据强强刮刮刮刮刮刮刮方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方###################

你好,

最后我找到了解决办法。 第一个问题是我使用了函数get_index_trim5(),但我们不需要()

第二个问题是,我试图用%s调用一个函数,但它返回了该函数的一个对象,而不是我想要的键

所以,它不是完美的,但它解决了我的问题,我决定在params部分将我的适配器变量拆分为不同的变量:对于没有函数get_index_trim5get_index_trim3的经典适配器,R1和R2,以及直接使用函数的其他两个变量

就这样我给我的情妇打了电话。在我对cutadapt的bash命令中

def get_index_trim5(wildcards):
    trim5 =wildcards.sample
    return config['trim5'][trim5]
 

def get_index_trim3(wildcards):
    trim3 =wildcards.sample
    return config['trim3'][trim3]


rule cutadapt_remove_adaptater_trimm:
    priority: 0
    input:
        reads=["../resources/sequences/{sample}_R1.fastq.gz", "../resources/sequences/{sample}_R2.fastq.gz"]
    output:
        fastq1= "../results/trimmed/{sample}_R1_trimmed.fastq.gz",
        fastq2= "../results/trimmed/{sample}_R2_trimmed.fastq.gz",
        qc= "../results/trimmed/{sample}_qc.txt"
    params:
        adapters_r1=" -a %s -a %s %s"%(config['adaptaters']['adap_R1'], config['adaptaters']['PolyAG']),
        adapters_r2= "-A %s -A %s "%(config['adaptaters']['adap_R2'], config['adaptaters']['PolyG']),
        extra="--minimum-length 100 -q 20",
        trim5=get_index_trim5,
        trim3=get_index_trim3
    log:
        "../results/logs/trimmed/{sample}_trimmed.log"
    benchmark : 
        "../results/benchmarks/trimmed/{sample}_trimmed_benchmark.txt"
    message:
        """
        --- Trimming on {wildcards.sample} {params.adapters_r1}
        
        {params.adapters_r2}  in process ---

        {params.trim5} {params.trim3}]
        """
    threads: 4
    resources:
        mem_mb=25000
    shell:
      "cutadapt -g {params.trim5} {params.adapters_r1} -a {params.trim3} -G {params.trim5} {params.adapters_r2} -A {params.trim3} {params.extra} -o {output.fastq1} -p {output.fastq2} -j {threads} {input.reads} > {output.qc} 2> {log}"

阿卡


Tags: sample函数configgetindexparamsresultsadapters
1条回答
网友
1楼 · 发布于 2024-05-13 23:25:27

I can't pass the wildcards.sample as an argument to my function.

您可以通过将自定义函数与lambda表达式相结合来解决此问题:

def get_index_trim3(sample, config):
    for trim3 in config['trim3']:      
        if sample == trim3:
             print (config['trim3'])
             print (config['trim3'][trim3])
                 return ( config['trim3'][trim3] )
         else: 
             continue

rule cutadapt:
    input:
        lambda wildcards: get_index_trim3(wildcards.sample, config)
    ...

顺便说一下,我不知道您的具体情况,但通常不需要指定完整的适配器序列。使用第一个公共基础作为所有基础的总括,如:

cutadapt -a GATCGGAAGAGCAC ...

也许试着看看它是否有效

相关问题 更多 >