从PDB中移除杂原子
需要把pdb文件中的杂原子去掉。这里有一段代码,但在我的测试PDB 1C4R上没有效果。
for model in structure:
for chain in model:
for reisdue in chain:
id = residue.id
if id[0] != ' ':
chain.detach_child(id)
if len(chain) == 0:
model.detach_child(chain.id)
有什么建议吗?
2 个回答
0
我曾经使用过一段代码,标题是去除残留物,可以在这个链接找到:http://pelican.rsvs.ulaval.ca/mediawiki/index.php/Manipulating_PDB_files_using_BioPython
这段代码会漏掉一些异原子。我猜可能是因为每次调用detach_child的时候,chain的值都会变化。
for model in structure:
for chain in model:
for reisdue in chain:
id = residue.id
if id[0] != ' ':
chain.detach_child(id)
if len(chain) == 0:
model.detach_child(chain.id)
我按照下面的方式修改后(就是避免动态修改可迭代对象),代码就正常工作了。(我这里只用了structure[0]。)
model = structure[0]
residue_to_remove = []
chain_to_remove = []
for chain in model:
for residue in chain:
if residue.id[0] != ' ':
residue_to_remove.append((chain.id, residue.id))
if len(chain) == 0:
chain_to_remove.append(chain.id)
for residue in residue_to_remove:
model[residue[0]].detach_child(residue[1])
for chain in chain_to_remove:
model.detach_child(chain)
5
异原子不应该是链的一部分。不过,你可以通过以下方法来判断一个残基是否是异原子:
pdb = PDBParser().get_structure("1C4R", "1C4R.pdb")
for residue in pdb.get_residues():
tags = residue.get_full_id()
# tags contains a tuple with (Structure ID, Model ID, Chain ID, (Residue ID))
# Residue ID is a tuple with (*Hetero Field*, Residue ID, Insertion Code)
# Thus you're interested in the Hetero Field, that is empty if the residue
# is not a hetero atom or have some flag if it is (W for waters, H, etc.)
if tags[3][0] != " ":
# The residue is a heteroatom
else:
# It is not
你还可以通过以下方式获取残基的ID(不包括前三个字段):
tags = residue.id
# or het_flag,_ ,_ = residue.id
if tags[0] != " ":
# The residue is a heteroatom
else:
# It is not
我这里加了一个相关文档的链接: http://biopython.org/DIST/docs/cookbook/biopdb_faq.pdf
这个主题在第8页,标题是“什么是残基ID?”引用如下:
这有点复杂,因为PDB格式比较笨重。残基ID是一个包含三个元素的元组:
- 异原子标志:这是'H_'加上异残基的名称(例如,葡萄糖分子的情况下是'H_GLC'),或者在水分子的情况下是'W'。
要添加注释并继续:
from Bio.PDB import PDBParser, PDBIO, Select
class NonHetSelect(Select):
def accept_residue(self, residue):
return 1 if residue.id[0] == " " else 0
pdb = PDBParser().get_structure("1C4R", "1C4R.pdb")
io = PDBIO()
io.set_structure(pdb)
io.save("non_het.pdb", NonHetSelect())