擅长:python、mysql、java
<p>使用@MartinEvans dictionary的替代实现(不一定更好也更快),但使用<code>re.findall()</code>生成kmer进行测试,并使用<code>map</code>和{<cd3>}代替(显式)内部循环:</p>
<pre><code>from Bio import SeqIO
from re import findall
from itertools import repeat
kmers = {}
with open('20frequent_20mers.txt') as f_kmers:
for line in f_kmers:
kmer, count = line.strip().split('\t')
kmers[kmer] = int(count)
for seq_record in SeqIO.parse("single_mapped.fa", "fasta"):
print(seq_record.id)
# use forward lookahead to make findall() find overlapping results;
score_fre = sum(map(kmers.get, findall(r'(?=([ACTG]{20}))', str(seq_record.seq)), repeat(0)))
print(score_fre)
</code></pre>