<p>我有一个输入文件,比如:</p>
<pre><code>COG1:aomo|At1g01190|aomo|At1g01280|aomo|At1g11600|homo|Hs10834998|homo|Hs13699816
COG2:aomo|At1g04160|somo|YAL029c|somo|YOR326w|homo|Hs10835119|aomo|At1g10260
COG3:somo|YAR009c|somo|YJL113w|aomo|At1g10260|aomo|At1g11265
</code></pre>
<p>因此,我需要一个简单的计数并生成一个输出文件,如:</p>
^{pr2}$
<p>为此,我使用:</p>
<pre><code> import re
l=[]
dict={}
with open("groups.txt","r") as f:
for line in f:
items=line.split(":")
key=items[0]
if key not in dict:
dict[key]={}
string=items[1]
words=re.findall("\S+\|\S+",string)
for w in words:
tmp=w.split("|")
if tmp[0] not in l:
l.append(tmp[0])
if tmp[0] in dict[key]:
dict[key][tmp[0]]=1+dict[key][tmp[0]]
else:
dict[key][tmp[0]]=1
for i in sorted(l):
print(i,end=" ")
print("")
for k in sorted(dict.keys()):
print(k,end=" ")
for i in sorted(l):
if i in dict[k]:
print(dict[k][i],end=" ")
else:
print("0", end=" ")
print("")
</code></pre>
<p>它工作得很好。。但当我更改输入文件时,如:</p>
<pre><code>COG1:aomo_At1g01190|aomo_At1g01280|aomo_At1g11600|homo_Hs10834998|homo_Hs13699816
COG2:aomo_At1g04160|somo_YAL029c|somo_YOR326w|homo_Hs10835119
COG3:somo_YAR009c|somo_YJL113w|aomo_At1g10260|aomo_At1g11265
</code></pre>
<p>并将代码改为:</p>
<pre><code>words=re.findall("\S+\_\S+",string)
for w in words:
tmp=w.split("_")
</code></pre>
<p>它给出以下错误:</p>
<pre><code>File "my_program.py", line 10, in (module)
string=items[1]
IndexError: list index out of range
</code></pre>