在fasta文件中更改反向序列的方向并不是

2024-04-29 09:56:45 发布

您现在位置:Python中文网/ 问答频道 /正文

我正试图在文件中正确定位反向序列。代码如下:

import os
import sys import pysam
from Bio import SeqIO, Seq, SeqRecord

def main(in_file):
    out_file = "%s.fa" % os.path.splitext(in_file)[0]
    with open(out_file, "w") as out_handle:
        # Write records from the BAM file one at a time to the output file.
        # Works lazily as BAM sequences are read so will handle large files.
        SeqIO.write(bam_to_rec(in_file), out_handle, "fasta")

def bam_to_rec(in_file):
    """Generator to convert BAM files into Biopython SeqRecords.
    """
bam_file = pysam.Samfile(in_file, "rb")
for read in bam_file:
    seq = Seq.Seq(read.seq)
    if read.is_reverse:
        seq = seq.reverse_complement()
    rec = SeqRecord.SeqRecord(seq, read.qname, "", "")
    yield rec

if __name__ == "__main__":
    main(*sys.argv[1:])`

当我打印出相反的序列时,代码就起作用了。但在文件中,它是按相反的顺序打印出来的。有人能帮我找出哪里出了问题吗? 以下是我的内嵌链接: https://www.dropbox.com/sh/68ui8l7nh5fxatm/AABUr82l01qT1nL8I_XgJaeTa?dl=0


Tags: 文件toinimportreadmainseqrecordout
1条回答
网友
1楼 · 发布于 2024-04-29 09:56:45

注意丑陋的计数器只是打印10000个序列,而不是更多。你知道吗

比较一个从不反转的和一个需要反转的 下面是几个seqs的输出,请随意测试它,我认为您的问题是yield返回一个迭代器,但您没有迭代它,除非我误解了您在做什么:

原件:

SOLEXA-1GA-2:2:93:1281:961#0 GGGTTAGGTTAGGGTTAGGGTTAGGGTTAGGGTTAG

变成:

SOLEXA-1GA-2:2:93:1281:961#0 CTAACCCTAACCCTAACCCTAACCCTAACCTAACCC

如果没有反转:

原件:

SOLEXA-1GA-2:2:12:96:1547#0 ACACACAAACACACACACACACACACACACACCCCC

变成:

SOLEXA-1GA-2:2:12:96:1547#0 ACACACAAACACACACACACACACACACACACCCCC Here's my code:

import os
import sys 
import pysam
from Bio import SeqIO, Seq, SeqRecord

def main(in_file):
    out_file = "%s.fa" % os.path.splitext(in_file)[0]
    with open('test_non_reverse.txt', 'w') as non_reverse:
        with open(out_file, "w") as out_handle:
            # Write records from the BAM file one at a time to the output file.
            # Works lazily as BAM sequences are read so will handle large files.
            i = 0
            for s in bam_to_rec(in_file):
                if i == 10000:
                   break
                i +=1 
                SeqIO.write(s, out_handle, "fasta")
            i = 0
            for s in convert_to_seq(in_file):
                if i == 10000:
                   break
                i +=1

                SeqIO.write(s, non_reverse, 'fasta')

def convert_to_seq(in_file):
    bam_file = pysam.Samfile(in_file, "rb")
    for read in bam_file:
        seq = Seq.Seq(read.seq)
        rec = SeqRecord.SeqRecord(seq, read.qname, "", "")
        yield rec


def bam_to_rec(in_file):
    """Generator to convert BAM files into Biopython SeqRecords.
    """
    bam_file = pysam.Samfile(in_file, "rb")
    for read in bam_file:
        seq = Seq.Seq(read.seq)
        if read.is_reverse:
            seq = seq.reverse_complement()
        rec = SeqRecord.SeqRecord(seq, read.qname, "", "")
        yield rec

if __name__ == "__main__":
    main(*sys.argv[1:])

相关问题 更多 >