python将每一行结合起来：使脚本更高效

1 590 SC 1.000 LEU2_YEAST 100% 1 590 EC 1.000 LEU2_ECOLI 100% 2 467 SC 1.000 FADH_YEAST 100% 2 467 EC 1.000 ADH3_ECOLI 100% 3 463 SC 1.000 6PG1_YEAST 100% 3 463 SC 0.816 6PG2_YEAST 3 463 EC 1.000 6PGD_ECOLI 100% 3 463 EC 0.903 6PG9_ECOLI 4 446 SC 1.000 YME1_YEAST 59% 4 446 EC 1.000 FTSH_ECOLI 100% 5 411 SC 1.000 ADH4_YEAST 100% 5 411 EC 1.000 ADH2_ECOLI 99% 8 256 SC 1.000 ATM1_YEAST 100% 8 256 EC 1.000 HLYB_ECOLI 99% 8 256 EC 0.987 HLY2_ECOLI 9 252 SC 1.000 MDL2_YEAST 100% 9 252 SC 0.203 MDL1_YEAST 9 252 EC 1.000 MSBA_ECOLI 99%

import sys Dict1 = {} for line in open(sys.argv[1]): line = line.strip().split() if line[0] not in Dict1.keys(): Dict1[line[0]] = [line[4]] elif line[0] in Dict1.keys(): Dict1[line[0]].append(line[4]) for i in Dict1.values(): if len(i) == 2: print i[0] + "\t" + i[1]

2条回答

网友

1楼 · 编辑于 2024-05-23 22:27:34

一个可能的改进是将if line[0] not in Dict1.keys()改为if line[0] not in Dict1，因为not in Dict1.keys()是一个O（n）操作，而not in Dict是关于O（1）。你知道吗

我不确定真正的业绩增长。你应该用time来解决这个问题。你知道吗

网友

2楼 · 编辑于 2024-05-23 22:27:34

如果文件按第一行中的数字排序，则可以使用^{}：

from itertools import groupby
import operator

with open(sys.argv[1]) as infile:
    # split lines and group them by the number in the first column
    groups= groupby([line.strip().split() for line in infile], operator.itemgetter(0))
# convert groups to lists and discard keys
groups= [list(lines) for _, lines in groups]
# discard groups that don't have 2 items and format the output
groups= ['%s\t%s'%(lines[0][4],lines[1][4]) for lines in groups if len(lines)==2]
# alternatively you can use
#   groups= ['\t'.join(zip(*lines)[4]) for lines in groups if len(lines)==2]

print '\n'.join(groups)

相关问题更多 >

编程相关推荐

热门问题

热门文章