<p>从你的例子来看,我想:</p>
<ul>
<li>要将每个表保存到不同的结果文件中。在</li>
<li>每个序列有65个字符长</li>
<li>有些序列包含一些必须删除的无意义空白(在您的示例中,第3行)</li>
</ul>
<p>这是我的代码示例,它从<code>input.dat</code>读取数据并将结果写入<code>result-column-<number>.dat</code>:</p>
<pre><code>import re
import sys
# I will write each table to different results-file.
# dictionary to map columns (numbers) to opened file objects:
resultfiles = {}
def get_result_file(column):
# helper to easily access results file.
if column not in resultfiles:
resultfiles[column] = open('result-column-%d.dat' % column, 'w')
return resultfiles[column]
# iterate over data:
for line in open('input.dat'):
try:
# str.split(separator, maxsplit)
# with `maxsplit`=2 it is more fail-proof:
no, score, seq = line.split(None, 2)
# from your example I guess that white-spaces in sequence are meaningless,
# however in your example one sequence contains white-space, so I remove it:
seq = re.sub('\s+', '', seq)
# data validation will help to spot problems early:
assert int(no), no
assert float(score), score
assert len(seq) == 65, seq
except Exception, e:
# print the error and continue to process data:
print >> sys.stderr, 'Error %s in line: %s.' % (e, line)
continue # jump to next iteration of for loop.
# int(), float() will rise ValueError if no or score aren't numbers
# assert <condition> will rise AssertionError if condition is False.
# iterate over each character in amino sequance:
for column, char in enumerate(seq, 1):
f = get_result_file(column)
f.write('%s %s %s\n' % (no, score, char))
# close all opened result files:
for f in resultfiles.values():
f.close()
</code></pre>
<p>本例中使用的值得注意的函数:</p>
<ul>
<li><a href="http://docs.python.org/library/functions.html#enumerate" rel="nofollow">enumerate</a></li>
<li><a href="http://docs.python.org/library/stdtypes.html#str.rsplit" rel="nofollow">str.split</a></li>
<li><a href="http://docs.python.org/library/re.html#re.sub" rel="nofollow">re.sub</a></li>
</ul>