python迭代仅选择字符串包含特定字符

2024-04-23 07:28:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我想迭代kmers列表,选择只包含字符A、T、G和C的项

kmers=["AL","AT","GC","AA","AP"]

for kmer in kmers:       
    for letter in kmer:
        if letter not in ["A","T","G","C"]:
            pass
        else:
            DNA_kmers.append(kmer)
            print("DNA_kmers",DNA_kmers)

输出:

DNA_kmers ['AL', 'AT', 'AT', 'GC', 'GC', 'AA', 'AA', 'AP']

期望输出:

DNA_kmers=["AT","GC","AA"]

我知道的唯一方法是

if "B" in kmer or "D" in kmer or "E" in kmer or "F" in kmer or "H" in kmer or "I" in kmer or "J" in kmer or "K" in kmer or "L" in kmer or "M" in kmer or "N" in kmer or "O" in kmer or "P" in kmer or "Q" in kmer or "R" in kmer or "S" in kmer or "U" in kmer or "V" in kmer or "W" in kmer or "X" in kmer or "Y" in kmer or "Z" in kmer:
   pass

Tags: orin列表forifpassgcat
1条回答
网友
1楼 · 发布于 2024-04-23 07:28:17

您的代码当前将添加任何字符匹配的项目。我们可以将其调整为仅添加两个字符匹配的项目:

kmers=["AL","AT","GC","AA","AP"]
DNA_kmers =[]

for kmer in kmers:       
    for letter in kmer:
        if letter not in ["A","T","G","C"]:
            break
    else:
        DNA_kmers.append(kmer)

print("DNA_kmers",DNA_kmers)

如果您不熟悉Python,我已经在for循环中使用了else子句。这不是所有语言都可用的。当且仅当循环完成所有迭代时,else块才会运行

有非常简单的方法来做你想做的事情。例如,以下内容将使用嵌套列表完成作业:

kmers=["AL","AT","GC","AA","AP"]

allowed = set("AGCT")
print([k for k in kmers if all([c in allowed for c in k])])

一个性能更好的通用解决方案是使用正则表达式:

import re

kmers=["AL","AT","GC","AA","AP"]
r = re.compile("^[ATGC]*$")
print([k for k in kmers if r.match(k)])

如果我们将问题仅限于k=2的k-mers,我们可以进一步优化性能。如果匹配固定长度的字符串,例如使用[AGCT]{2},则正则表达式的性能应该略有提高。我们还可以使用product创建一个用于恒定时间查找的集合:

import itertools

kmers=["AL","AT","GC","AA","AP"]

allowed = {a+b for a,b in itertools.product("AGCT", repeat=2)}
print([k for k in kmers if k in allowed])

相关问题 更多 >