擅长:python、mysql、java
<p>事实上,你扫描字符串的次数远远超过了你需要的20次。对于小的测试序列来说,这可能并不重要,但是当它们变得更大时,这将是显而易见的。我建议采用一种不同的方法来解决重叠带来的副作用问题:</p>
<pre><code>nucleotides = [ 'A', 'T', 'G', 'C' ]
dinucleotides = [ x+y for x in nucleotides for y in nucleotides ]
counts = { x : 0 for x in nucleotides + dinucleotides }
# count the first nucleotide, which has no previous one
n_nucl = 1
prevn = dna[0]
counts[prevn] += 1
# count the rest, along with the pairs made with each previous one
for nucl in dna[1:]:
counts[nucl] += 1
counts[prevn + nucl] += 1
n_nucl += 1
prevn = nucl
total = 0.0
for nucl in nucleotides:
pct = counts[nucl] / float(n_nucl)
total += pct
print "{} : {} {}%".format(nucl, counts[nucl], pct)
print "Total : {}%".format(total)
total = 0.0
for dnucl in dinucleotides:
pct = counts[dnucl] / float(n_nucl - 1)
total += pct
print "{} : {} {}%".format(dnucl, counts[dnucl], pct)
print "Total : {}%".format(total)
</code></pre>
<p>这种方法只扫描一次字符串,尽管它是公认的更多的代码。。。在</p>