如何在Pmatch结果中提取坐标？

Scanning sequence ID: BEST1_HUMAN 150 (-) 1.000 0.997 GGAAAggccc R05891 354 (+) 0.988 0.981 gtgtAGACAtt R06227 V$CREL_01c-RelV$EVI1_05Evi-1 Scanning sequence ID: 4F2_HUMAN 365 (+) 1.000 1.000 gggacCTACA R05884 789 (-) 1.000 1.000 gcgCGAAA R05828; R05834; R05835; R05838; R05839 V$CREL_01c-RelV$E2F_02E2F

1条回答

网友

1楼 · 发布于 2024-05-16 01:43:47

使用this answer中的片段将结果分割为大小均匀的块，并提取所需的数据：

def chunks(l, n):
    #Generator to yield n sized chunks from l
    for i in xrange(0, len(l), n):
        yield l[i: i + n]

with open('p_match.txt') as f:
    for chunk in chunks(f.readlines(), 6):
        sequence_id = chunk[0].split()[-1].strip()
        for i in (2,3):
            start = int(chunk[i].split()[0].strip())
            sequence = chunk[i].split()[-2].strip()
            stop = start + len(sequence)
            print sequence_id, start, stop

编辑：很明显，结果可能包含可变数量的起始位置，因此，上述均分块的解决方案不起作用。然后可以使用regex路径或逐行遍历文件：

with open('p_match.txt') as f:
    text = f.read()
    chunks = text.split('Scanning sequence ID:')
    for chunk in chunks:
        if chunk:
            lines = chunk.split('\n')
            sequence_id = lines[0].strip()
            for line in lines:
                if line.startswith('              '):
                    start = int(line.split()[0].strip())
                    sequence = line.split()[-2].strip()
                    stop = start + len(sequence)
                    print sequence_id, start, stop

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何在Pmatch结果中提取坐标？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >