无法使此正则表达式适用于snakem中的通配符\u约束

2条回答

网友

1楼 · 编辑于 2024-05-13 09:00:42

如果您不想让您的行以RNASeq或DNaseSeq开头，可以这样做

r'^(?!RNASeq)(?!DNaseSeq).+'

网友

2楼 · 编辑于 2024-05-13 09:00:42

我相信以下是你想要达到的目标：

# Snakefile

rule sam_startswith_dna:
    output: '{pattern}.sam'
    wildcard_constraints: pattern='dna.+'
    shell: 'touch {output}'

rule sam_not_startswith_dna:
    output: '{pattern}.sam'
    wildcard_constraints: pattern='(?!dna).+'  # negative lookahead assertion
    shell: 'touch {output}'

rule bam_endswith_rna:
    output: '{pattern}.bam'
    wildcard_constraints: pattern='.+rna'
    shell: 'touch {output}'

rule bam_not_endswith_rna:
    output: '{pattern}.bam'
    wildcard_constraints: pattern='.+(?<!rna)'  # negative lookbehind assertion
    shell: 'touch {output}'

使用它（snakemake 4.6.0，python 3.6）：

^{pr2}$

我觉得你在做什么：

^{3}$

使用它：

$ snakemake -s Snakefile2 dna_data.sam  # runs rule: sam_startswith_dna_

$ snakemake -s Snakefile2 rna_data.sam  # raises MissingRuleException :( :( :(

以下是如何修复它：

# Snakefile3

rule sam_startswith_dna_:
    output: '{pattern}.sam'
    wildcard_constraints: pattern='dna_.+'
    shell: 'touch {output}'

rule sam_not_startswith_dna_:
    output: '{pattern}.sam'
    wildcard_constraints: pattern='(?!dna)[^_]{3}_.+'
    shell: 'touch {output}'

使用它：

$ snakemake -s Snakefile3 -n dna_data.sam  # runs rule: sam_startswith_dna_

$ snakemake -s Snakefile3 -n rna_data.sam  # runs rule: sam_not_startswith_dna_

但是由于硬编码{3}，它不是很通用：

$ snakemake -s Snakefile3 -n gdna_data.sam  # raises MissingRuleException

以下内容是基于我对snakemake.io.regex的简要阅读，可能包含错误

一般来说，给出这样的规则：

rule some_rule:
    output: 'some.{pattern}.txt'
    wildcard_constraints: pattern='[a-z_]+'
    shell: 'touch {output}'

以及这样的命令行调用：

$ snakemake some.tar_get.txt

如果满足以下条件，将执行规则some_rule

re.search('some\.(?P<pattern>[a-z_]+)\.txt$', 'some.tar_get.txt')

返回匹配项（假设其他检查通过（例如歧义、循环dag等））。在

有趣的是，$被附加到模式中，但是^没有被加在前面。在

这种行为与我最初的想法不同，我最初的想法是这样的（这将允许在你的^和{}在你的wildcard_constraints中使用）：

# python3, pseudo-code-ish

output = 'some.{pattern}.txt'
pattern = '[a-z_]+'

target = 'some.tar_get.txt'

# First test: does the target file name match the output (without the constraint)?
m = re.search('some\.(?P<pattern>.+)\.txt', target)
if not m:
    raise MissingInputException

# Second test: does the wildcard satisfy user-supplied constraint?
m = re.search(pattern, m.group('pattern'))
if not m:
    raise MissingInputException

run_rule()

相关问题更多 >

编程相关推荐

热门问题

热门文章

无法使此正则表达式适用于snakem中的通配符\u约束

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >