<p>我可能会这样做:</p>
<pre><code>chars_i_want = set('atcg')
final_string = ''.join(c for c in start_string if c in chars_i_want)
</code></pre>
<p>这可能是最简单的方法。</p>
<hr/>
<p>另一种选择是使用<code>str.translate</code>来完成工作:</p>
<pre><code>import string
chars_to_remove = string.printable.translate(None,'acgt')
final_string = start_string.translate(None,chars_to_remove)
</code></pre>
<p>我不确定哪个会表现得更好。需要通过<code>timeit</code>来计时才能确定。</p>
<hr/>
<p><strong>更新</strong>:计时!</p>
<pre><code>import re
import string
def test_re(s,regex=re.compile('[^atgc]')):
return regex.sub(s,'')
def test_join1(s,chars_keep=set('atgc')):
return ''.join(c for c in s if c in chars_keep)
def test_join2(s,chars_keep=set('atgc')):
""" list-comp is faster, but less 'idiomatic' """
return ''.join([c for c in s if c in chars_keep])
def translate(s,chars_to_remove = string.printable.translate(None,'acgt')):
return s.translate(None,chars_to_remove)
import timeit
s = 'ag ct oso gcota'
for func in "test_re","test_join1","test_join2","translate":
print func,timeit.timeit('{0}(s)'.format(func),'from __main__ import s,{0}'.format(func))
</code></pre>
<p>很遗憾(对我来说),<code>regex</code>在我的机器上获胜:</p>
<pre><code>test_re 0.901512145996
test_join1 6.00346088409
test_join2 3.66561293602
translate 1.0741918087
</code></pre>