<p>我觉得我的这个旧剧本可能有用。在</p>
<p>有一点需要注意的是,基因组是从NCBI而不是ENA下载的,但我认为这些数据库中的很多都是相互同步的。所以你还是可以找到你想要的。在</p>
<p>如果您只想从给定的登录号(~2500)下载这些基因组,那么这个<em>可能不起作用(除非您可能在下载之前对返回的<code>search_results</code>进行过滤;<code>Entrez.efetch</code>)。在</p>
<pre><code>#!/usr/bin/env python
from Bio import Entrez
search_term = raw_input("Organism name: ")
Entrez.email = "your_email@isp.com" # required by NCBI
search_handle = Entrez.esearch(db="nucleotide", term=search_term, usehistory="y", property='complete genome')
search_results = Entrez.read(search_handle)
search_handle.close()
gi_list = search_results["IdList"]
count = int(search_results["Count"])
webenv = search_results["WebEnv"]
query_key = search_results["QueryKey"]
batch_size = 5 # download sequences in batches so NCBI doesn't time you out
with open("ALL_SEQ.fasta", "w") as out_handle:
for start in range(0, count, batch_size):
end = min(count, start+batch_size)
print "Going to download record %i to %i" % (start+1, end)
fetch_handle = Entrez.efetch(db="nucleotide", rettype="fasta", retmode="text",retstart=start, retmax=batch_size, webenv=webenv, query_key=query_key)
data = fetch_handle.read()
fetch_handle.close()
out_handle.write(data)
print ("\nDownload completed")
</code></pre>