如何使用readline()从第二行开始读取?

5 投票
5 回答
13569 浏览
提问于 2025-04-16 16:13

我正在用Python写一个小程序,这个程序会读取一个FASTA文件,FASTA文件通常是这样的格式:

>gi|253795547|ref|NC_012960.1| Candidatus Hodgkinia cicadicola Dsem chromosome, 52 lines
GACGGCTTGTTTGCGTGCGACGAGTTTAGGATTGCTCTTTTGCTAAGCTTGGGGGTTGCGCCCAAAGTGA
TTAGATTTTCCGACAGCGTACGGCGCGCGCTGCTGAACGTGGCCACTGAGCTTACACCTCATTTCAGCGC
TCGCTTGCTGGCGAAGCTGGCAGCAGCTTGTTAATGCTAGTGTTGGGCTCGCCGAAAGCTGGCAGGTCGA

我已经创建了另一个程序,可以读取这个FASTA文件的第一行(也就是头部信息),现在我想让这个第二个程序从序列开始读取并打印出来。

我该怎么做呢?

到目前为止,我有这个:

FASTA = open("test.txt", "r")

def readSeq(FASTA):
    """returns the DNA sequence of a FASTA file"""
    for line in FASTA:
        line = line.strip()
        print line          


readSeq(FASTA)

谢谢大家

-菜鸟

5 个回答

2

你可能会对查看BioPythons如何处理Fasta文件感兴趣(源代码)。

def FastaIterator(handle, alphabet = single_letter_alphabet, title2ids = None):
    """Generator function to iterate over Fasta records (as SeqRecord objects).

handle - input file
alphabet - optional alphabet
title2ids - A function that, when given the title of the FASTA
file (without the beginning >), will return the id, name and
description (in that order) for the record as a tuple of strings.

If this is not given, then the entire title line will be used
as the description, and the first word as the id and name.

Note that use of title2ids matches that of Bio.Fasta.SequenceParser
but the defaults are slightly different.
"""
    #Skip any text before the first record (e.g. blank lines, comments)
    while True:
        line = handle.readline()
        if line == "" : return #Premature end of file, or just empty?
        if line[0] == ">":
            break

    while True:
        if line[0]!=">":
            raise ValueError("Records in Fasta files should start with '>' character")
        if title2ids:
            id, name, descr = title2ids(line[1:].rstrip())
        else:
            descr = line[1:].rstrip()
            id = descr.split()[0]
            name = id

        lines = []
        line = handle.readline()
        while True:
            if not line : break
            if line[0] == ">": break
            #Remove trailing whitespace, and any internal spaces
            #(and any embedded \r which are possible in mangled files
            #when not opened in universal read lines mode)
            lines.append(line.rstrip().replace(" ","").replace("\r",""))
            line = handle.readline()

        #Return the record and then continue...
        yield SeqRecord(Seq("".join(lines), alphabet),
                         id = id, name = name, description = descr)

        if not line : return #StopIteration
    assert False, "Should not reach this line"
4

你应该把你的脚本展示出来。如果想从第二行开始读取,可以这样做:

f=open("file")
f.readline()
for line in f:
    print line
f.close()
5
def readSeq(FASTA):
    """returns the DNA sequence of a FASTA file"""
    _unused = FASTA.next() # skip heading record
    for line in FASTA:
        line = line.strip()
        print line  

查看关于file.next()的文档,了解为什么在使用file.readline()for line in file:时要小心混用。

撰写回答