BioPython：从Blast输出文件中提取序列ID

3 投票

2 回答

7970 浏览

提问于 2025-04-15 15:43

我有一个BLAST的输出文件，格式是XML。这个文件里有22个查询序列，每个序列报告了50个匹配结果。我想提取所有的50x22个匹配结果。现在我有的代码只能提取第一个查询的50个匹配结果。

from Bio.Blast import NCBIXM
blast_records = NCBIXML.parse(result_handle)
blast_record = blast_records.next()

save_file = open("/Users/jonbra/Desktop/my_fasta_seq.fasta", 'w')

for alignment in blast_record.alignments:
    for hsp in alignment.hsps:
            save_file.write('>%s\n' % (alignment.title,))
save_file.close()

有没有人能给我一些建议，怎么才能提取所有的匹配结果？我想我需要用别的方法，而不是对齐。希望我说得清楚。谢谢！

乔恩

数据处理 xml解析生物信息学序列提取匹配结果 blast

2 个回答

我用这段代码来提取所有的结果

from Bio.Blast import NCBIXML
for record in NCBIXML.parse(open("rpoD.xml")) :
    print "QUERY: %s" % record.query
    for align in record.alignments :
        print " MATCH: %s..." % align.title[:60]
        for hsp in align.hsps :
            print " HSP, e=%f, from position %i to %i" \
                % (hsp.expect, hsp.query_start, hsp.query_end)
            if hsp.align_length < 60 :
                 print "  Query: %s" % hsp.query
                 print "  Match: %s" % hsp.match
                 print "  Sbjct: %s" % hsp.sbjct
            else :
                 print "  Query: %s..." % hsp.query[:57]
                 print "  Match: %s..." % hsp.match[:57]
                 print "  Sbjct: %s..." % hsp.sbjct[:57]


print "Done"

或者提取更少的细节

from Bio.Blast import NCBIXML
for record in NCBIXML.parse(open("NC_003197.xml")) :
    #We want to ignore any queries with no search results:
    if record.alignments :
        print "QUERY: %s..." % record.query[:60]
        for align in record.alignments :
            for hsp in align.hsps :
                print " %s HSP, e=%f, from position %i to %i" \
                % (align.hit_id, hsp.expect, hsp.query_start, hsp.query_end)
print "Done"

我使用了这个网站

http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/rpsblast/

回答于 2025-04-15 由 Python大师

分享举报

这段代码应该能获取所有记录。与原来的代码相比，这里有个新鲜的地方是

for blast_record in blast_records

这是一个Python的写法，用来遍历一个“像列表一样”的对象，比如blast_records（查看CBIXML模块的文档可以看到，parse()确实返回了一个迭代器）

from Bio.Blast import NCBIXM
blast_records = NCBIXML.parse(result_handle)

save_file = open("/Users/jonbra/Desktop/my_fasta_seq.fasta", 'w')

for blast_record in blast_records:
  for alignment in blast_record.alignments:
      for hsp in alignment.hsps:
            save_file.write('>%s\n' % (alignment.title,))
  #here possibly to output something to file, between each blast_record
save_file.close()

回答于 2025-04-15 由 Python大师

分享举报

BioPython：从Blast输出文件中提取序列ID

2 个回答

撰写回答