<p>像这样的策略可能会让你的工作更轻松。它通过<code>id</code>跟踪跟踪项的<code>merged_items</code>dict,并保存<code>name</code>、<code>blah1</code>和<code>blah2</code>的值。然后,使用<code>csv</code>的<code>reader</code>,它逐行迭代每个文件,而不是一次迭代所有文件,以减少每次使用的必要内存。最后,它再次逐行写出项目。您需要对此进行修补,以适合您的具体用例,但这应该是一个不错的开始</p>
<pre><code>merged_items = {}
with open ('file1.csv','r') as csv_file:
reader = csv.reader(csv_file)
next(reader) # skip first row
for row in reader:
row_id = row[0]
name = row[3]
merged_items[row_id] = {'name':name}
with open ('file2.csv','r') as csv_file:
reader = csv.reader(csv_file)
next(reader) # skip first row
for row in reader:
row_id = row[0]
blah1 = row[2]
merged_items[row_id]['blah1'] = blah1
with open ('file3.csv','r') as csv_file:
reader = csv.reader(csv_file)
next(reader) # skip first row
for row in reader:
row_id = row[0]
blah2 = row[3]
merged_items[row_id]['blah2'] = blah2
with open('output.csv','w', newline='') as output:
writer = csv.writer(output, delimiter='\t') # change these options as you see fit
for id, metadata in merged_items.items():
writer.writerow([id, metadata['name'], metadata['blah1'], metadata['blah2'])
</code></pre>