Biopython:将蛋白质片段从PDB导出到FASTA文件

2024-03-29 07:33:51 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在将PDB蛋白质序列片段写入fasta格式,如下所示

from Bio.SeqIO import PdbIO, FastaIO

def get_fasta(pdb_file, fasta_file, transfer_ids=None):
    fasta_writer = FastaIO.FastaWriter(fasta_file)
    fasta_writer.write_header()
    for rec in PdbIO.PdbSeqresIterator(pdb_file):
        if len(rec.seq) == 0:
            continue
        if transfer_ids is not None and rec.id not in transfer_ids:
            continue
        print(rec.id, rec.seq, len(rec.seq))
        fasta_writer.write_record(rec)

get_fasta(open('pdb1tup.ent'), open('1tup.fasta', 'w'), transfer_ids=['1TUP:B'])
get_fasta(open('pdb1olg.ent'), open('1olg.fasta', 'w'), transfer_ids=['1OLG:B'])
get_fasta(open('pdb1ycq.ent'), open('1ycq.fasta', 'w'), transfer_ids=['1YCQ:B'])

它给出了以下错误

AttributeError                            Traceback (most recent call last)
<ipython-input-9-8ecf92753ac9> in <module>
     12         fasta_writer.write_record(rec)
     13 
---> 14 get_fasta(open('pdb1tup.ent'), open('1tup.fasta', 'w'), transfer_ids=['1TUP:B'])
     15 get_fasta(open('pdb1olg.ent'), open('1olg.fasta', 'w'), transfer_ids=['1OLG:B'])
     16 get_fasta(open('pdb1ycq.ent'), open('1ycq.fasta', 'w'), transfer_ids=['1YCQ:B'])

<ipython-input-9-8ecf92753ac9> in get_fasta(pdb_file, fasta_file, transfer_ids)
     10             continue
     11         print(rec.id, rec.seq, len(rec.seq))
---> 12         fasta_writer.write_record(rec)
     13 
     14 get_fasta(open('pdb1tup.ent'), open('1tup.fasta', 'w'), transfer_ids=['1TUP:B'])

~/anaconda3/envs/bioinformatics/lib/python3.7/site-packages/Bio/SeqIO/FastaIO.py in write_record(self, record)
    303     def write_record(self, record):
    304         """Write a single Fasta record to the file."""
--> 305         assert self._header_written
    306         assert not self._footer_written
    307         self._record_written = True

AttributeError: 'FastaWriter' object has no attribute '_header_written'

我四处搜索并选中了thisthisthis,但无法解决问题。 完整的代码是here,问题在最后一个单元格中

编辑:我正在使用

conda version : 4.8.3
conda-build version : 3.18.11
python version : 3.7.6.final.0
biopython version : 1.77.dev0 

Tags: inselfidsgetopenrecordseqfasta
2条回答

你可以用Biopython做这个

from Bio import SeqIO
pdbfile = '2tbv.pdb'
with open(pdbfile) as handle:
    sequence = next(SeqIO.parse(handle, "pdb-atom"))
with open("2tbv.fasta", "w") as output_handle:
    SeqIO.write(sequence, output_handle, "fasta")

我不确定我不使用的fasta_writer,但您可以将所需的字符串序列存储到listdict中,然后手动将它们写入fasta:

## with list
data = '>'+'\n>'.join([f'{i}\n{seq}' for i, seq in enumerate(seq_list)])+'\n'
## or with dict
data = '>'+'\n>'.join([f'{name}\n{seq}' for name, seq in seq_dict.iteritems()])+'\n' 

with open('path/to/my-fasta-file.fasta', 'wt') as f:
    f.write(data)

(只有当data的末尾都是较大循环的一部分时,seq_list的批写入同一个fasta文件,才需要在data的末尾添加新行)

相关问题 更多 >