<p>正如我在评论中提到的:文件对象是流,一旦过了某个点,就再也看不到它了—您需要将一个文件放入内存中,以便将其中一个对象的所有行与另一个对象的所有行进行比较</p>
<p>此代码将较小的文件读入内存,并逐行处理较大的文件</p>
<p>第一个匹配行请求较小文件中所有行的数据,之后较小文件的行从内存中删除,因此它与后面的行不匹配:</p>
<p>创建文件:</p>
<pre><code>with open("f1.txt","w") as f:
f.write("""66054,14.7065,42.1115
66054,14.7085,42.106
66054,14.7268,42.0937
66054,14.6739,42.125
66054,14.7268,42.0937
66100,14.116,42.3301
66100,14.1405,42.3392
88067,16.431,38.7287
88068,16.5339,38.6899
88068,16.5499,38.685
88068,16.5419,38.6875
87076,16.4795,39.7905
87076,16.4743,39.8161
87100,16.2531,39.2989
87100,16.2944,39.2674
87100,16.3039,39.2709
87052,16.43,39.3449
87053,16.3399,39.3101
87054,16.3171,39.1784""")
with open ("f2.txt","w") as f:
f.write("""ABC,66100
"CDF",65125
"123",65125
1234,64100
0123,75025
lmn,85025
abc,88046
"Random",88068
"Raond2",87100
"Raondm3",87100
Raondom4,87054""")
</code></pre>
<p>程序</p>
<pre><code>import csv
d2 ={}
# smaller file: load in memory
with open("f2.txt") as f:
cr = csv.reader(f)
for row in cr:
# store under same key as list of rows to keep same order and
# allow multiple rows with same row[1] value
k = d2.setdefault(row[1],[])
k.append(row)
# process larger file
with open("f1.txt") as f, open("f3.txt","w",newline="") as nf:
cr = csv.reader(f)
writer = csv.writer(nf)
for row in cr:
if d2.get(row[0],[]):
for sl in d2.get(row[0]):
writer.writerow( (sl + [row[1],row[2]]) )
# remove from d2 so no reappearing rows will be written
del d2[row[0]]
with open("f3.txt") as f:
print(f.read())
</code></pre>
<p>输出:</p>
<pre><code>ABC,66100,14.116,42.3301
Random,88068,16.5339,38.6899
Raond2,87100,16.2531,39.2989
Raondm3,87100,16.2531,39.2989
Raondom4,87054,16.3171,39.1784
</code></pre>
<p>只有文件2中在文件1中完全匹配的内容才会放入文件3</p>