用于比较2个文件的嵌套循环

#! /usr/bin/env python import sys import fileinput # Open the two files f1 = open(sys.argv[1], "r") f2 = open(sys.argv[2], "r") for line in f1: chrR,chrStart,chrEnd,name,score,strand1,codingStart,codingEnd,itemRbg,blockCount,blockSize,BlockStart = line.strip().split() chr = range(int(chrStart), int(chrEnd)) lncRNA = set(chr) for line in f2: chrC,clustStart,clustEnd,annote,score,strand = line.strip().split() clust = range(int(clustStart), int(clustEnd)) cluster = set(clust) if strand1 == '-': if chrR == chrC: if strand1 == strand: if cluster & lncRNA: print name,annote,'transcript' continue else: continue continue break

3条回答

网友

1楼 · 编辑于 2024-05-28 23:44:36

你是故意在找到第一个目标后做一个“继续”。然后在第一行之后做一个“休息”。在

你不需要这么做。第二个循环将继续到f2的下一行。然后，当它到达f2的末尾时，它将进入f1的下一行。如果你想检查f1中的每一行和f2中的每一行，那么所有那些继续的（和中断）都是多余的。在

尝试：

for line in f1:
     chrR,chrStart,chrEnd,name,score,strand1,codingStart,codingEnd,itemRbg,blockCount,blockSize,BlockStart = line.strip().split()
    chr = range(int(chrStart), int(chrEnd))
    lncRNA = set(chr)
    for line2 in f2:
            chrC,clustStart,clustEnd,annote,score,strand = line2.strip().split()
            clust = range(int(clustStart), int(clustEnd))
            cluster = set(clust)
            if strand1 == '-':
                    if chrR == chrC:
                            if strand1 == strand:
                                    if cluster & lncRNA:
                                            print name,annote,'transcript'

网友

2楼 · 编辑于 2024-05-28 23:44:36

在f1中的第一行之后，您已经从f2文件中读取了所有行，因此for line2 in f2对{}文件中的第二行和后续行没有迭代，除非f2文件在磁盘上增长。在

#!/usr/bin/env python
import sys

def intersect(r1, r2):
    return r2[0] < (r1[-1]+1) and r1[0] < (r2[-1]+1)

with open(sys.argv[2]) as f2:
     chrC_set, strand_set, clusters = set(), set(), []
     for i, line in enumerate(f2):
         parts = line.split()
         if len(parts) != 6:
            print >>sys.stderr, "%d line has %d parts: %s" % (i, len(parts), line),
            continue
         chrC, clustStart, clustEnd, annote, _, strand = parts
         chrC_set.add(chrC)
         strand_set.add(strand)
         clusters.append((xrange(int(clustStart), int(clustEnd)), annote))

with open(sys.argv[1]) as f1:
     for i, line in enumerate(f1):
         parts = line.split()
         if len(parts) < 6:
            print >>sys.stderr, "%d line has %d parts: %s" % (i, len(parts), line),
            continue
         chrR, chrStart, chrEnd, name, _, strand1 = parts[:6]
         if strand1 == '-' and chrR in chrC_set and strand1 in strand_set:
            lncRNA = xrange(int(chrStart), int(chrEnd))
            for cluster, annote in clusters:
                if intersect(cluster, lncRNA):
                   print name, annote, 'transcript'

网友

3楼 · 编辑于 2024-05-28 23:44:36

测试if strand1 == '-'不依赖于f2的内容。因此，只有当当前行的f1包含值为“-”的strand1时，才可以在循环f2之前将其放入，并启动对f2所有内容的检查

再考虑到先有if strand1 == '-'，然后是{}，这意味着您只对f2中的行感兴趣，其中串的值为'-'。在

此外，我采用了J.F.Sebastian的思想来测试两个范围的交集，而不需要借助集合，而只测试范围的边界。但是，不需要使用范围或xrange，测试边界就足够了。在

因此，我提出以下代码，作为您算法的一个简单改进：

for line in f1:
    (chrR,chrStart,chrEnd,name,score,strand1,codingStart,codingEnd,
     itemRbg,blockCount,blockSize,BlockStart) = line.strip().split()
    if strand1 == '-':
        s,e = int(chrStart), int(chrEnd)
        for line in f2:
            chrC,clustStart,clustEnd,annote,score,strand = line.strip().split()
            if strand=='-' and chrR == chrC \
               and int(clustStart)<e and s<int(clustEnd):
                print name,annote,'transcript'
        f2.seek(0,0)

一。在

但是，这个算法（你的，更正过的）很差：对于包含值为“-”的f1的每一行，f2的内容都有完整的读取。在

J.F.Sebastian的算法要好得多。
我用上面表达的想法对它做了一点改进。在

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章