读取多个blast文件（biopython）

1 投票

2 回答

1512 浏览

提问于 2025-04-17 19:04

我正在尝试读取一系列通过多次提交到NCBI blast网站生成的XML文件。我想从每个文件中打印出特定的信息行。

我想读取的文件都带有后缀"_recombination.xml"。

for file in glob.glob("*_recombination.xml"):
    result_handle= open(file)
    blast_record=NCBIXML.read(result_handle)
    for alignment in blast_record.alignments:
        for hsp in alignment.hsps:
            print "*****Alignment****"
            print "sequence:", alignment.title
            print "length:", alignment.length
            print "e-value:", hsp.expect
            print hsp.query
            print hsp.match
            print hsp.sbjct

这个脚本首先找到所有带有"_recombination.xml"后缀的文件，然后我希望它能读取每个文件，并打印出特定的行（这基本上是直接从BioPython的食谱中复制的），看起来它是能做到的。但是我遇到了以下错误：

Traceback (most recent call last):
File "Scripts/blast_test.py", line 202, in <module>
blast_record=NCBIXML.read(result_handle)
File "/Library/Python/2.7/site-packages/Bio/Blast/NCBIXML.py", line 576, in read
first = iterator.next()
File "/Library/Python/2.7/site-packages/Bio/Blast/NCBIXML.py", line 643, in parse
expat_parser.Parse("", True) # End of XML record
xml.parsers.expat.ExpatError: no element found: line 3106, column 7594

我不太确定问题出在哪里。我不确定它是否在尝试重新遍历已经读取过的文件——例如，关闭文件似乎有帮助：

for file in glob.glob("*_recombination.xml"):
    result_handle= open(file)
    blast_record=NCBIXML.read(result_handle)
    for alignment in blast_record.alignments:
        for hsp in alignment.hsps:
            print "*****Alignment****"
            print "sequence:", alignment.title
            print "length:", alignment.length
            print "e-value:", hsp.expect
            print hsp.query
            print hsp.match
            print hsp.sbjct
    result_handle.close()
    blast_record.close()

但这也给我带来了另一个错误：

Traceback (most recent call last): 
File "Scripts/blast_test.py", line 213, in <module> blast_record.close() 
AttributeError: 'Blast' object has no attribute 'close'

错误处理 xml文件文件读取数据解析 ncbi biopython blast

2 个回答

我想给Biogeek的回答加个评论，但我还没达到足够的声望，所以不能。其实他说得对，你应该使用

NCBIXML.parse(open(input_xml))

而不是用NCBIXML.read(open(input_xml))，因为你是“想要读取一系列的XML文件”，对于一系列的XML文件，你需要解析而不是读取。这样解决你的问题了吗？

回答于 2025-04-17 由 Python大师

分享举报

我通常使用parse方法，而不是read方法，也许这对你有帮助：

for blast_record in NCBIXML.parse(open(input_xml)):
    for alignment in blast_record.alignments:
        for hsp in alignment.hsps:
            print "*****Alignment****"
            print "sequence:", alignment.title
            print "length:", alignment.length
            print "e-value:", hsp.expect
            print hsp.query
            print hsp.match
            print hsp.sbjct

并且确保你的xml文件是在查询时使用-outfmt 5生成的

回答于 2025-04-17 由 Python大师

分享举报

读取多个blast文件（biopython）

2 个回答

撰写回答