hwo在snakem中使用plink定义两个规则

2024-05-19 03:23:49 发布

您现在位置:Python中文网/ 问答频道 /正文

我将演示一个简单的例子,我有两个文件命名为a.map和a.ped这是plink格式。我想使用两个命令,第一个转换为bfile格式,第二个转换为raw格式。你知道吗

我的文件:a.map,a.ped:

> $ cat a.map

1 snp1 0 1
1 snp2 0 2
1 snp3 0 3


> $ cat a.ped 

1 1 0 0 1  0  1 1  2 2  1 1
1 2 0 0 2  0  2 2  0 0  2 1
1 3 1 2 1  2  0 0  1 2  2 1
2 1 0 0 1  0  1 1  2 2  0 0
2 2 0 0 2  2  2 2  2 2  0 0
2 3 1 2 1  2  1 1  2 2  1 1

第一命令:

plink --file a --out b

我有四个文件:b.bed b.bamb、 家庭日志

(base) [dengfei@localhost plink-test]$ ls b*
b.bed  b.bim  b.fam  b.log

第二命令:

 plink --bfile b --out c --recodeA

我得到两个文件:

c.log  c.raw

我的问题是: 在第一个命令中,plink使用--out生成b.bim, b.bed,b.fam,但在snakemake中,我不能在第二个命令中使用名称。你知道吗

我的第一个蛇形锉刀:

rule bfile:
params:
    a1 = "a",
    a2 = "b"
shell:"plink --file {params.a1} --out {params.a2}"

它运行良好。你知道吗

    (base) [dengfei@localhost plink-test]$ snakemake -s test1.py 
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
    count   jobs
    1   bfile
    1

rule bfile:
    jobid: 0

PLINK v1.90b6.5 64-bit (13 Sep 2018)           www.cog-genomics.org/plink/1.9/
(C) 2005-2018 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to b.log.
Options in effect:
  --file a
  --out b

63985 MB RAM detected; reserving 31992 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (3 variants, 6 people).
--file: b.bed + b.bim + b.fam written.
Finished job 0.
1 of 1 steps (100%) done

当我在snakemake中添加另一个规则来运行第二个命令时,出现了问题,我的Snakefile:

rule all:
    input:
        "c.log","c.raw"
rule bfile:
    params:
        a1 = "a",
        a2 = "b"
    shell:"plink --file {params.a1} --out {params.a2}"
rule cfile:
    params:
        aa1 = "b",
    aa2 = "c"
    shell:"plink --bfile {params.aa1} --out {params.aa2} --recodeA"

它显示c.log和c.raw缺少输入

MissingInputException in line 1 of /home/dengfei/test/snakemake/plink-test/test1.py:
Missing input files for rule all:
c.log
c.raw

我不知道如何把这两条规则联系起来。任何建议都是伟大的!非常感谢你。你知道吗


Tags: 文件test命令lograwa1paramsout
2条回答

Snakemake使用inputoutput文件来标识工作流中的依赖项,它们在您的规则中丢失了。为规则bfilecfile定义inputoutput文件,然后在rule all中定义工作流的最终文件(或预期的外文件)。你知道吗

rule all:
    input:
        "c.log","c.raw"

rule bfile:
    input:
        "input files of rule bfile here"
    output:
        "output files of rule bfile here"
    params:
        a1 = "a",
        a2 = "b"
    shell:
        "plink  file {params.a1}  out {params.a2}"

rule cfile:
    input:
        "rule bfile outfiles"
    output:
        "c.log", "c.raw"
    params:
        aa1 = "b",
        aa2 = "c"
    shell:
        "plink  bfile {params.aa1}  out {params.aa2}  recodeA"

我建议通过snakemake tutorial工作。你知道吗

在JeeYem的帮助下,正确的代码是:

rule all:
    input:
        "c.log","c.raw"

rule bfile:
    input:
        "a.map","a.ped"
    output:
        "b.bed","b.bim","b.fam"
    params:
        a1 = "a",
        a2 = "b"
    shell:
        "plink  file {params.a1}  out {params.a2}"

rule cfile:
    input:
        "b.bed","b.bim","b.fam"
    output:
        "c.log", "c.raw"
    params:
        aa1 = "b",
        aa2 = "c"
    shell:
        "plink  bfile {params.aa1}  out {params.aa2}  recodeA"

我运行snakemake,它生成我想要的结果:

Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
    count   jobs
    1   all
    1   bfile
    1   cfile
    3

rule bfile:
    input: a.map, a.ped
    output: b.bed, b.bim, b.fam
    jobid: 2

PLINK v1.90b6.5 64-bit (13 Sep 2018)           www.cog-genomics.org/plink/1.9/
(C) 2005-2018 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to b.log.
Options in effect:
   file a
   out b

63985 MB RAM detected; reserving 31992 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (3 variants, 6 people).
 file: b.bed + b.bim + b.fam written.
Finished job 2.
1 of 3 steps (33%) done

rule cfile:
    input: b.bed, b.bim, b.fam
    output: c.log, c.raw
    jobid: 1

PLINK v1.90b6.5 64-bit (13 Sep 2018)           www.cog-genomics.org/plink/1.9/
(C) 2005-2018 Shaun Purcell, Christopher Chang   GNU General Public License v3
Note:  recodeA flag deprecated.  Use 'recode A ...'.
Logging to c.log.
Options in effect:
   bfile b
   out c
   recode A

63985 MB RAM detected; reserving 31992 MB for main workspace.
3 variants loaded from .bim file.
6 people (4 males, 2 females) loaded from .fam.
3 phenotype values loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 4 founders and 2 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.777778.
3 variants and 6 people pass filters and QC.
Among remaining phenotypes, 3 are cases and 0 are controls.  (3 phenotypes are
missing.)
 recode A to c.raw ... done.
Finished job 1.
2 of 3 steps (67%) done

localrule all:
    input: c.log, c.raw
    jobid: 0

Finished job 0.
3 of 3 steps (100%) done

相关问题 更多 >

    热门问题