from collections import defaultdict
results = defaultdict(list)
with open("PDBs.txt") as fh:
for line in fh:
line = line.strip()
if line:
pdb, chain = line.split("_")
results[pdb].append(chain)
# Note that you would need to extend this if more than 4 chains are possible
prefix = {2: "second", 3: "third", 4: "fourth"}
with open("chains.txt", "w") as fh:
for pdb, chains in results.items():
fh.write(f"First chain of {pdb} is {chains[0]}")
for ii, chain in enumerate(chains[1:], start=1):
fh.write(f" and {prefix[ii + 1]} is {chain}")
fh.write("\n")
“chains.txt”的内容:
First chain of 150L is A and second is B and third is C and fourth is D
First chain of 16GS is A and second is B
First chain of 17GS is A and second is B
First chain of 18GS is A and second is B
First chain of 19GS is A and second is B
您只需通过拆分操作和循环即可实现这一点。 首先用空字符分割数据,以得到一个列表。然后,每个块由一个键和一个值组成,用下划线分隔。您可以迭代所有块,并将它们拆分为键和值。然后简单地创建一个python字典,其中包含每个键的所有值的数组
您可以通过首先读取文件并将PDB和链标签提取到将PDB ID映射到链标签列表(这里称为^{)的字典来实现这一点。然后,您可以通过迭代这些结果并构造您指定的输出行,逐行写入“chains.txt”文件:
“chains.txt”的内容:
相关问题 更多 >
编程相关推荐