在Python中比较两个文件

0 投票

4 回答

634 浏览

提问于 2025-04-18 10:06

我正在用Python比较两个文件，A和C，但不知道为什么双重循环似乎没有正常工作：

with open(locationA + filenameC,'r') as fileC, open(locationA + filenameA,'r') as fileA:
     for lineC in fileC:
         fieldC = lineC.split('#')
         for lineA in fileA:
             fieldA = lineA.split('#')
             print 'UserID Clicks' + fieldC[0]
             print 'UserID Activities' + fieldA[0]
             if (fieldC[0] == fieldA[0]) and (fieldC[2] == fieldA[2]):
                 print 'OK'

在这里，只有C文件的行似乎被比较了，而A文件的其他行似乎被忽略了。

有人能帮我解决这个问题吗？

代码调试文件比较行比较双重循环

4 个回答

我知道这是一个老帖子，但当有人在谷歌上寻找比较两个文本文件的解决方案时，这个帖子会出现。

这段代码对我有效。

你可以更新代码，使用“with open”来代替，并根据自己的需要进行调整，但这段代码已经能完成任务了。

# Ask the user to enter the names of files to compare
fname1 = input("Enter the first filename (text1.txt): ")
fname2 = input("Enter the second filename (text1.txt): ")

# Open file for reading in text mode (default mode)
f1 = open(fname1)
f2 = open(fname2)

# Print confirmation
print("-----------------------------------")
print("Comparing files ", " > " + fname1, " < " +fname2, sep='\n')
print("-----------------------------------")

# Read the first line from the files
f1_line = f1.readline()
f2_line = f2.readline()

# Initialize counter for line number
line_no = 1

# Loop if either file1 or file2 has not reached EOF
while f1_line != '' or f2_line != '':

    # Strip the leading whitespaces
    f1_line = f1_line.rstrip()
    f2_line = f2_line.rstrip()

    # Compare the lines from both file
    if f1_line != f2_line:

        # If a line does not exist on file2 then mark the output with + sign
        if f2_line == '' and f1_line != '':
            print(">+", "Line-%d" % line_no, f1_line)
        # otherwise output the line on file1 and mark it with > sign
        elif f1_line != '':
            print(">", "Line-%d" % line_no, f1_line)

        # If a line does not exist on file1 then mark the output with + sign
        if f1_line == '' and f2_line != '':
            print("<+", "Line-%d" % line_no, f2_line)
        # otherwise output the line on file2 and mark it with < sign
        elif f2_line != '':
            print("<", "Line-%d" %  line_no, f2_line)

        # Print a blank line
        print()

    #Read the next line from the file
    f1_line = f1.readline()
    f2_line = f2.readline()


    #Increment line counter
    line_no += 1

# Close the files
f1.close()
f2.close()

回答于 2025-04-18 由 Python大师

分享举报

你正在把文件A的每一行和文件C的每一行进行比较。这意味着，对于文件C的每一行，你都要读取整个文件A。如果你把文件A的读取指针移到开头，你就得一次又一次地读取它。

其实，最简单的方法是同时读取这两个文件，只要它们都有内容。如果它们相同，就做一些操作，然后继续从两个文件中读取。

如果它们不同，就从内容较小的那一行开始读取（比如，如果文件A的某一行小于文件C的某一行，就只读取文件A；反之亦然）。

最后，当还有剩余的行时，再进行两个循环（每个文件一个循环，因为你不知道哪个文件的内容先读完）。

回答于 2025-04-18 由 Python大师

分享举报

关于嵌套循环的问题（从你现在遇到的问题来看），就是内层循环会在每次外层循环的时候都完整运行一遍。所以，建议你直接通过调用 fileA 的迭代器来设置 lineA，这样做更有效：

with open(locationA + filenameC,'r') as fileC, open(locationA + filenameA,'r') as fileA:
     for lineC in fileC:
         fieldC = lineC.split('#')
         lineA = next(fileA)
         fieldA = lineA.split('#')
         print 'UserID Clicks' + fieldC[0]
         print 'UserID Activities' + fieldA[0]
         if (fieldC[0] == fieldA[0]) and (fieldC[2] == fieldA[2]):
             print 'OK'

这样做的逻辑是，一旦 fileC 的内容用完了，就会忽略 fileA 中的多余行。如果 fileC 的行数比 fileA 多，没有特别的检查，可能会出现一些麻烦。

另一种方法是使用 itertools.izip() 来将每个文件的行配对收集起来：

import itertools
with open(locationA + filenameC,'r') as fileC, open(locationA + filenameA,'r') as fileA:
for lineC, lineA in itertools.izip(fileC, fileA):
         fieldC = lineC.split('#')
         fieldA = lineA.split('#')
         print 'UserID Clicks' + fieldC[0]
         print 'UserID Activities' + fieldA[0]
         if (fieldC[0] == fieldA[0]) and (fieldC[2] == fieldA[2]):
             print 'OK'

我想不出有什么特别的理由让你选择其中一个而不是另一个，但如果文件的大小稍微大一点，千万不要被内置的 zip() 函数诱惑了，应该使用 itertools.izip()。因为前者会返回一个列表，这样内存使用量就会依赖于文件的大小，而后者是一个生成器，会根据需要逐个生成值。

回答于 2025-04-18 由 Python大师

分享举报

你的问题是，当你第一次遍历fileA的时候，指针会移动到文件的末尾。这样的话，如果你想再次从头开始读取，就需要把指针重新移回文件的开头。
所以你可以考虑从两个文件中分别创建两个列表，这样你就可以根据需要多次遍历这些列表了。比如：

fileC_list = fileC.readlines()
fileA_list = fileA.readlines()
for lineC in fileC_list:
  # do something
  for lineA in fileA_list:
    # do somethins

回答于 2025-04-18 由 Python大师

分享举报

在Python中比较两个文件

4 个回答

撰写回答