Python同时按行迭代两个文件

0 投票

2 回答

2936 浏览

提问于 2025-04-18 14:47

我正在尝试比较两个文件中的列，看看它们的值是否匹配。如果匹配的话，我想把那一行的数据合并在一起。我的问题是，当我分别从两个文件逐行读取时，Python无法同时遍历这两个文件并寻找匹配。相反，它会正确遍历一个文件，但在第二个文件中却会多次遍历同一行……

我以前也遇到过这个问题，但至今还没有找到解决办法。我知道缩进是一个问题，因为我在使用“for line in a, for line in b”时搞乱了循环，所以我以为下面的尝试会有效，但结果并没有。我查找了一些解决方案，但似乎没有人使用相同的方法，所以我在想我是不是完全走错了方向？有没有人能解释一下更好的做法是什么，我的方法是否根本可行，如果不可行，原因是什么？谢谢，真的很感激！

这是我两个文件的格式，基本上我想比较两个文件中的filename列，如果匹配的话，我想把行合并在一起。

file1:
cluster_id  hypothesis_id   filename    M1_name_offset  Orientation
1   71133076    unique_name_1.png   esc_sox2_Sox1_80_4  forward
1   50099120    unique_name_4.png   hb_cebpb_ETS1_139_7 forward
1   91895576    unique_name_11.png  he_tal1_at_AC_acptr_258_11  forward

file2:
Name                Cluster_No  Pattern     filename
esc_sox2_Sox1_80    Cluster1    AP1(1N)ETS      unique_name_4.png
hb_cebpb_ETS1_139   Cluster1    CREB(1N)ETS     unique_name_11.png
he_tal1_at_AC_acptr_258 Cluster2    ETS(-1N)ZIC     unique_name_3.png

我尝试过的：

for aline in file1:
    motif1 = aline.split()[2]
    for bline in file2:
        motif2 = bline.split()[-1]
            if motif1 = motif2:
                print "match", aline, bline

我还尝试过：

for aline in file1:
    motif1 = aline.split()[2]
for bline in file2:
    motif2 = bline.split()[-1]
        if motif1 = motif2:
            print "match", aline, bline

我还尝试过使用字符串格式化，但没有任何效果。第一种方法错误地遍历了file2，第二种方法则没有任何输出。我尝试了很多次，试过各种缩进和额外的代码，但我真的不知道该怎么修复它！请帮帮我 :(

字符串格式化循环控制文件读取代码调试文件比较数据合并列匹配行迭代

2 个回答

我假设你在使用Python 3。这里有一个很不错的抽象概念，叫做iterlines。它把打开文件、读取内容、配对和关闭n个文件的复杂过程都隐藏起来了。注意使用了zip_longest，这样可以防止较长文件的末尾部分被悄悄丢掉。

def iterlines(*paths, fillvalue=None, **open_kwargs):
    files = []
    try:
        for path in paths:
            files.append(open(path, **open_kwargs))
        for lines in zip_longest(*files, fillvalue=fillvalue):
            yield lines
    finally:
        for file_ in files:
            with suppress():
                file_.close()

用法

for line_a, line_b in iterlines('a.txt', 'b.txt'):
    print(line_a, line_b)

完整代码

from contextlib import suppress
from itertools import zip_longest


def iterlines(*paths, fillvalue=None, **open_kwargs):
    files = []
    try:
        for path in paths:
            files.append(open(path, **open_kwargs))
        for lines in zip_longest(*files, fillvalue=fillvalue):
            yield lines
    finally:
        for file_ in files:
            with suppress():
                file_.close()


for lines in iterlines('a.txt', 'b.txt', 'd.txt'):
    print(lines)

回答于 2025-04-18 由 Python大师

分享举报

使用内置的 zip 函数。

with open(file1) as f1, open(file2) as f2:
    for line1, line2 in zip(f1, f2):
        motif1 = line1.split()[0]
        motif2 = line2.split()[0]
        ...

需要注意的是，zip 在 Python 2 和 Python 3 中的表现是不一样的。在 Python 2 中，使用 itertools.izip 会更高效一些。

回答于 2025-04-18 由 Python大师

分享举报

Python同时按行迭代两个文件

2 个回答

撰写回答