<p>Python的<a href="http://docs.python.org/library/csv.html#module-csv" rel="nofollow">CSV</a>和collections模块,特别是<a href="http://docs.python.org/library/collections.html#collections.OrderedDict" rel="nofollow">OrderedDict</a>,在这里非常有用。你想使用OrderedDict来维护密钥的顺序,等等。你不必这么做,但它很有用!</p>
<pre><code>import csv
from collections import OrderedDict
signature_row_map = OrderedDict()
with open('hosts.csv') as file_object:
for line in csv.DictReader(file_object, delimiter='\t'):
signature_row_map[line['Signature']] = {'line': line, 'found_at': None}
with open('masterlist.csv') as file_object:
for i, line in enumerate(csv.DictReader(file_object, delimiter='\t'), 1):
if line['Signature'] in signature_row_map:
signature_row_map[line['Signature']]['found_at'] = i
with open('newhosts.csv', 'w') as file_object:
fieldnames = ['Path', 'Filename', 'Size', 'Signature', 'RESULTS']
writer = csv.DictWriter(file_object, fieldnames, delimiter='\t')
writer.writer.writerow(fieldnames)
for signature_info in signature_row_map.itervalues():
result = '{0} FOUND in masterlist {1}'
# explicit check for sentinel
if signature_info['found_at'] is not None:
result = result.format('', '(row %s)' % signature_info['found_at'])
else:
result = result.format('NOT', '')
payload = signature_info['line']
payload['RESULTS'] = result
writer.writerow(payload)
</code></pre>
<p>下面是使用测试CSV文件的输出:</p>
<pre><code>Path Filename Size Signature RESULTS
C:\ a.txt 14kb 012345 NOT FOUND in masterlist
D:\ b.txt 99kb 678910 FOUND in masterlist (row 1)
C:\ c.txt 44kb 111213 FOUND in masterlist (row 2)
</code></pre>
<p>请原谅偏差,它们是分开的:)</p>