在DNA序列文件中特定位置替换核苷酸
我有一个fasta文件,还有一个文件里面包含了位置的信息。我想在每个序列的特定位置用一个默认值来替换,比如说,我的位置文件看起来像这样:a/c 120,而我的替换表是把a/c替换成W,所以我想得到一个新的fasta文件,把位置120的内容替换成w。
这个程序是用Python写的。
现在第一个问题是,我无法找到正确的位置。例如,如果我用my_seq_id[0:3],我得到的是序列的名字,而不是序列本身。我的位置文件看起来像这样:
id1 219 A/C
from Bio import SeqIO
import sys
import string
userInput1=raw_input("enter your sequence:")
userInput2=raw_input('enter your position file:')
fasta_file=userInput1
position_file=userInput2
result_file="outfile.txt"
id_list=list()
position_list=list()
nucleotide_list=list()
with open(position_file) as f:
for line in f:
line=line.strip()
headerline = line.split()
position=headerline[1]
ID=headerline[0]
nucleotide=headerline[2]
nucleotide_list.add(nucleotide)
position_list.add(position)
id_list.add(ID)
fasta_sequence=SeqIO.parse(open(fasta_file), 'fasta')
with open(result_file, 'w') as f:
if seq_record.id in wanted and nucleotide_list="A/C":
seq_record[position_list]="W\n"
SeqIO.write([seq_record], f, "fasta")
1 个回答
0
你的代码有点让人困惑,应该不是这样:
fasta_sequence=SeqIO.parse(open(fasta_file), 'fasta')
with open(result_file, 'w') as f:
if seq_record.id in wanted and nucleotide_list="A/C":
seq_record[position_list]="W\n"
SeqIO.write([seq_record], f, "fasta")
而是这样:
fasta_sequence=SeqIO.parse(open(fasta_file), 'fasta')
with open(result_file, 'w') as f:
if fasta_sequence[j].id in wanted and nucleotide_list[i]=="A/C":
fasta_sequence[j].seq[position_list]="W\n"
SeqIO.write([fasta_sequence[j]], f, "fasta")
这里用i作为一个计数器来遍历nucleotide_list,用j来遍历fasta_sequence。
你也可以这样遍历fasta_sequence:
for record in SeqIO.parse(StringIO(data), "fasta"):
print("%s %s" % (record.id, record.seq))
这里id是每个元素的标识符,seq是它的序列。
---- 更新 ----
我想我明白你想做什么了,就是遍历文件中的每一条记录,检查id是否匹配,然后在那个位置更改序列,应该这样做:
#goes through each record in the file
for record in SeqIO.parse(StringIO(data), "fasta"):
# check if id is wanted
if record.id in id_list:
# get list of every item in id_list that matches record.id
positions_of_id_in_id_list = [i for i, j in enumerate(id_list) if j == record.id]
for elem_position_lists in positions_of_id_in_id_list:
# I think here you want to write the new record in the correct position (substitute "W\n" with new item. Maybe nucleotide[elem_position_lists]?)
record.seq[position_list[elem_position_lists]] = "W\n"
# write new file
SeqIO.write(fasta_sequence, f, "fasta")