python文件读取和列表索引超出范围

2024-03-28 16:23:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个输入文件,比如:

COG1:aomo|At1g01190|aomo|At1g01280|aomo|At1g11600|homo|Hs10834998|homo|Hs13699816 
COG2:aomo|At1g04160|somo|YAL029c|somo|YOR326w|homo|Hs10835119|aomo|At1g10260
COG3:somo|YAR009c|somo|YJL113w|aomo|At1g10260|aomo|At1g11265

因此,我需要一个简单的计数并生成一个输出文件,如:

^{pr2}$

为此,我使用:

 import re
    l=[]
    dict={}
    with open("groups.txt","r") as f:
     for line in f:
      items=line.split(":")
      key=items[0]
      if key not in dict:
       dict[key]={}
      string=items[1]
      words=re.findall("\S+\|\S+",string)
      for w in words:
       tmp=w.split("|")
       if tmp[0] not in l:
        l.append(tmp[0])
      if tmp[0] in dict[key]:
        dict[key][tmp[0]]=1+dict[key][tmp[0]]
       else:
       dict[key][tmp[0]]=1
    for i in sorted(l):
     print(i,end=" ")
    print("")
    for k in sorted(dict.keys()):
     print(k,end=" ")
     for i in sorted(l):
      if i in dict[k]:
       print(dict[k][i],end=" ")
       else:
       print("0", end=" ")
     print("")

它工作得很好。。但当我更改输入文件时,如:

COG1:aomo_At1g01190|aomo_At1g01280|aomo_At1g11600|homo_Hs10834998|homo_Hs13699816  
COG2:aomo_At1g04160|somo_YAL029c|somo_YOR326w|homo_Hs10835119  
COG3:somo_YAR009c|somo_YJL113w|aomo_At1g10260|aomo_At1g11265

并将代码改为:

words=re.findall("\S+\_\S+",string)
for w in words:
    tmp=w.split("_")

它给出以下错误:

File "my_program.py", line 10, in (module)           
string=items[1]
IndexError: list index out of range

Tags: 文件keyinforstringifitemstmp
3条回答

这是一种简单的方法:

>>> my_string = "COG1: aomo|At1g01190 aomo|At1g01280 aomo|At1g11600 homo|Hs10834998 homo|Hs13699816 "
>>> a,b = my_string.split(":")    # will split strings on ":"
>>> a
'COG1'
>>> b
' aomo|At1g01190 aomo|At1g01280 aomo|At1g11600 homo|Hs10834998 homo|Hs13699816 '
>>> import re
>>> from collections import Counter
>>> my_count = Counter(re.findall("aomo|homo|somo",b)) # findall will find all, and Counter will give dictionary of for count of each element
>>> my_count
Counter({'aomo': 3, 'homo': 2})
>>> "{} {} {} {}".format(a,my_count.get('aomo',0),my_count.get('homo',0),my_count.get('somo',0))
'COG1 3 2 0'

可能是第二个文件中的一些空行。因此,拆分时,它的列表长度为1>;['']。当访问列表[1]时,将引发索引错误。在

您不必使用全能的re模块就可以实现这一点。在

template = '{0:4} {1:4} | {2:4} | {3:4}'
columns = ['aomo', 'homo', 'somo']

with open('groups.txt') as f:
    print template.format(' ', *columns)
    for line in f:
        key, value = line.split(':')
        counts = [value.count(column_label) for column_label in columns]
        print template.format(key.strip(), *counts)

相关问题 更多 >