<p>使用<a href="http://docs.python.org/2/library/itertools.html#itertools.groupby" rel="nofollow">itertools.groupby</a>可以更简单地处理这个问题。<code>groupby</code>可以对处理相同mrn、样本号和lab num的所有连续行进行聚类</p>
<p>执行此操作的代码是</p>
<pre><code>for key, group in IT.groupby(reader, key = mykey):
</code></pre>
<p>其中<code>reader</code>迭代输入文件的行,<code>mykey</code>由</p>
^{pr2}$
<p>来自<code>reader</code>的每一行都被传递给<code>mykey</code>,所有具有相同键的行都聚集在同一个<code>group</code>中。在</p>
<hr/>
<p>在这一过程中,我们不妨使用<a href="http://docs.python.org/2/library/csv.html" rel="nofollow">csv module</a>将每一行读入dict(我称之为<code>row</code>)。这使我们不必处理诸如<code>line.rstrip("\n").split("\t")</code>这样的低级字符串操作,而不是通过索引号(例如<code>row[3]</code>)来引用列,我们可以编写用更高级的术语(如<code>row['lab_num']</code>)来表示的代码。在</p>
<hr/>
<pre><code>import itertools as IT
import csv
inFile = 'curious.dat'
outFile = 'curious.out'
def mykey(row):
return (row['mrn'], row['specimen_id'], row['lab_num'])
fieldnames = 'mrn specimen_id date lab_num Bilirubin Lipase Calcium Magnesium Phosphate'.split()
with open(inFile, 'rb') as ifd:
reader = csv.DictReader(ifd, delimiter = '\t')
with open(outFile, 'wb') as ofd:
writer = csv.DictWriter(
ofd, fieldnames, delimiter = '\t', lineterminator = '\n', )
writer.writeheader()
for key, group in IT.groupby(reader, key = mykey):
new = {}
row = next(group)
for key in ('mrn', 'specimen_id', 'date', 'lab_num'):
new[key] = row[key]
new[row['labtest']] = row['result_val']
for row in group:
new[row['labtest']] = row['result_val']
writer.writerow(new)
</code></pre>
<p>收益率</p>
<pre><code>mrn specimen_id date lab_num Bilirubin Lipase Calcium Magnesium Phosphate
4419529 1614487 26.2675 5802791G 0.1
3319529 1614487 26.2675 5802791G 0.3 153 8.1 2.1 4
5713871 682571 56.0779 9732266E 4.1
</code></pre>