如何使用Python根据BED文件格式修改坐标格式？

from Bio import SeqIO output_file = open('fileC.bed','w') for long_sequence_record in SeqIO.parse(open('fileA.fasta'), 'fasta'): long_sequence = str(long_sequence_record.seq) for short_sequence_record in SeqIO.parse(open('fileB.fasta'), 'fasta'): short_sequence = str(short_sequence_record.seq) if short_sequence in long_sequence: start = long_sequence.index(short_sequence) + 1 stop = start + len(short_sequence) - 1 # print short_sequence_record.id, start, stop output_line ='%s\t%i\t%i\n' % \ (short_sequence_record.id,start,stop) output_file.write(output_line ) output_file.close()

2条回答

网友

1楼 · 编辑于 2024-04-19 00:29:55

好吧，你在索引上加了一个1，在这里你可以找到一个较短的序列-

start = long_sequence.index(short_sequence) + 1 < - notice the +1

别那么做，应该没事的。也不要对stop变量执行-1。你知道吗

您应该从id添加起始序列号

示例-

start = long_sequence.index(short_sequence) + int((short_sequence_record.id.split(':')[1].split('-')[0]))
stop = start + len(short_sequence)

对于记录的id，如果您不希望在:之前有任何内容，那么您应该在:处分割id，并获取左侧部分（分割后的索引字符串为0）。你知道吗

示例-

output_line ='%s\t%i\t%i\n' % \
        ((short_sequence_record.id.split(':')[0]),start,stop)
        output_file.write(output_line )

网友

2楼 · 编辑于 2024-04-19 00:29:55

更一般的解决方案：

获取引用序列数据并使用relevant UCSC Kent Tools将其准备就绪。你知道吗
执行BLAT search将短查询字符串与引用数据对齐，并获得一个PSL文件。你知道吗
Convert the PSL output去睡觉。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章