如何用Biopython遍历fasta文件并修改记录ID

0 投票

2 回答

2504 浏览

提问于 2025-04-18 00:12

我不是程序员，对Python也很陌生，正在自学中……我有一个文件，里面有84条记录，格式如下：

1
2
3
X
Y
MT
GL000210.1

我想把一个包含84条记录的fasta文件中的所有序列的记录ID都改掉。这里有一个fasta文件的例子：

>name
agatagctagctgatcgatcgatttttttcga
>name1
gagatagatattattttttttttaagagagagcgcgatcgatgc
>name2
agatgctagggc
...

具体来说，我想把第一个记录ID（以“>”开头的）换成上面例子文件中的第一个条目，依此类推。目前我写了一个脚本，能够一个一个地修改ID，但我不知道怎么同时遍历两个文件：

from Bio import SeqIO

records = list(SeqIO.parse("new_human_v37.fasta", "fasta"))
modified_record = records[0]
print(modified_record.id.replace("old_name", "first_entry_file1"))

输出文件应该是这样的：

>1
agatagctagctgatcgatcgatttttttcga
>2
gagatagatattattttttttttaagagagagcgcgatcgatgc
>3
agatgctagggc
...

有人能帮帮我吗？

序列处理脚本编写文件遍历生物信息学数据修改记录ID fasta文件

2 个回答

试试看这个。

# first create a new file to write into ex: "fasta_file_new.fasta"
# then run the code
fasta_file_new = open("fasta_file_new.fasta", "w")
fasta_file_read = open("new_human_v37.fasta", "r")
replace_lines = open("replacer.txt", "r")


for f in fasta_file_read.readlines():
    if f.__contains__(">"):
        fasta_file_new.write(">" + replace_lines.readline())
    else:
        fasta_file_new.write(f)


fasta_file_new.close()
fasta_file_read.close()
replace_lines.close()

回答于 2025-04-18 由 Python大师

分享举报

你可以这样做（假设第一个文件的行数和第二个文件一样）。如果你想生成一个包含修改后记录的新文件。

from Bio import SeqIO
lines_file = open(my_lines_file, 'r')
fout = open("example.fa", "w")
records = list(SeqIO.parse("new_human_v37.fasta", "fasta"))

for r in records:
    line = lines_file.getline()
    r.id = line.rstrip()
    SeqIO.write(fout, r, 'fasta')


lines_file.close()
fout.close()

回答于 2025-04-18 由 Python大师

分享举报

如何用Biopython遍历fasta文件并修改记录ID

2 个回答

撰写回答