在Python中打印前馈环路
我有一个非常大的输入文件,内容大致是这样的:(你可以在这里下载)
1. FLO8;PRI2
2. FLO8;EHD3
3. GRI2;BET2
4. HAL4;AAD3
5. PRI2;EHD3
6. QLN3;FZF1
7. QLN3;ABR5
8. FZF1;ABR5
...
可以把它看作一个两列的表格,分号前面的元素对应分号后面的元素。
我想要逐个打印出简单的字符串,显示出构成前馈环路的三个元素。上面那个编号的列表会输出:
"FLO8 PRI2 EHD3"
"QLN3 FZF1 ABR5"
...
解释一下第一个输出行是如何形成前馈环路的:
A -> B (FLO8;PRI2)
B -> C (PRI2;EHD3)
A -> C (FLO8;EHD3)
只看这个链接中圈起来的那个。
我现在有这个,但运行得非常慢……有没有什么建议可以让它更快一些?
import csv
TF = []
TAR = []
# READING THE FILE
with open("MYFILE.tsv") as tsv:
for line in csv.reader(tsv, delimiter=";"):
TF.append(line[0])
TAR.append(line[1])
# I WANT A BETTER WAY TO RUN THIS.. All these for loops are killing me
for i in range(len(TAR)):
for j in range(len(TAR)):
if ( TAR[j] != TF[j] and TAR[i] != TF[i] and TAR[i] != TAR[j] and TF[j] == TF[i] ):
for k in range(len(TAR )):
if ( not(k == i or k == j) and TF[k] == TAR[j] and TAR[k] == TAR[i]):
print "FFL: "+TF[i]+ " "+TAR[j]+" "+TAR[i]
注意:我不想要自环……也就是 A -> A,B -> B 或 C -> C。
2 个回答
0
测试集
targets = {'A':['B','C','D'],'B':['C','D'],'C':['A','D']}
还有这个函数
for i in targets.keys():
try:
for y in targets.get(i):
#compares the dict values of two keys and saves the overlapping ones to diff
diff = list(set(targets.get(i)) & set(targets.get(y)))
#if there is at least one element overlapping from key.values i and y
#take up those elements and style them with some arrows
if (len(diff) > 0 and not i == y):
feed = i +'->'+ y + '-->'
forward = '+'.join(diff)
feedForward = feed + forward
print (feedForward)
except:
pass
输出结果是
A->B-->C+D
A->C-->D
C->A-->D
B->C-->D
向Radboud计算生物学课程的Robin问好(2016年第一季度)。
2
我使用一个包含集合的字典,这样可以非常快速地查找数据,像这样:
编辑:防止自环:
from collections import defaultdict
INPUT = "RegulationTwoColumnTable_Documented_2013927.tsv"
# load the data as { "ABF1": set(["ABF1", "ACS1", "ADE5,7", ... ]) }
data = defaultdict(set)
with open(INPUT) as inf:
for line in inf:
a,b = line.rstrip().split(";")
if a != b: # no self-loops
data[a].add(b)
# find all triplets such that A -> B -> C and A -> C
found = []
for a,bs in data.items():
bint = bs.intersection
for b in bs:
for c in bint(data[b]):
found.append("{} {} {}".format(a, b, c))
在我的电脑上,这个加载数据只需要0.36秒,然后找到1,933,493个解决方案花了2.90秒;结果看起来像这样:
['ABF1 ADR1 AAC1',
'ABF1 ADR1 ACC1',
'ABF1 ADR1 ACH1',
'ABF1 ADR1 ACO1',
'ABF1 ADR1 ACS1',
编辑2:不确定这是不是你想要的,但如果你需要A指向B,A指向C,B指向C,但不需要B指向A,或者C指向A,或者C指向B,你可以试试:
found = []
for a,bs in data.items():
bint = bs.intersection
for b in bs:
if a not in data[b]:
for c in bint(data[b]):
if a not in data[c] and b not in data[c]:
found.append("{} {} {}".format(a, b, c))
不过这样还是返回了1,380,846个解决方案。