Python Difflib Deltas和比较Ndi

# Python Difflib demo # Author: Neal Walters # loosely based on http://ahlawat.net/wordpress/?p=371 # 01/17/2011 # build the files here - later we will just read the files probably file1Contents=""" for j = 1 to 10: print "ABC" print "DEF" print "HIJ" print "JKL" print "Hello World" print "j=" + j print "XYZ" """ file2Contents = """ for j = 1 to 10: print "ABC" print "DEF" print "HIJ" print "JKL" print "Hello World" print "XYZ" print "The end" """ filename1 = "diff_file1.txt" filename2 = "diff_file2.txt" file1 = open(filename1,"w") file2 = open(filename2,"w") file1.write(file1Contents) file2.write(file2Contents) file1.close() file2.close() #end of file build lines1 = open(filename1, "r").readlines() lines2 = open(filename2, "r").readlines() import difflib print "\n FILE 1 \n" for line in lines1: print line print "\n FILE 2 \n" for line in lines2: print line diffSequence = difflib.ndiff(lines1, lines2) print "\n ----- SHOW DIFF ----- \n" for i, line in enumerate(diffSequence): print line diffObj = difflib.Differ() deltaSequence = diffObj.compare(lines1, lines2) deltaList = list(deltaSequence) print "\n ----- SHOW DELTALIST ----- \n" for i, line in enumerate(deltaList): print line #let's suppose we store just the diffSequence in the database #then we want to take the current file (file2) and recreate the original (file1) from it #by backward applying the diff restoredFile1Lines = difflib.restore(diffSequence,1) # 1 indicates file1 of 2 used to create the diff restoreFileList = list(restoredFile1Lines) print "\n ----- SHOW REBUILD OF FILE1 ----- \n" # this is not showing anything! for i, line in enumerate(restoreFileList): print line

3条回答

网友

1楼 · 编辑于 2024-06-06 08:43:01

diff必须包含足够的信息，以便能够将一个版本修补到另一个版本，因此，是的，对于您将单行更改为非常小的文档的实验，存储整个文档可能会更便宜。

库函数返回迭代器，以便在内存不足或只需要查看结果序列的一部分的客户端上更容易。在Python中是可以的，因为每个迭代器都可以转换为一个非常短的list(an_iterator)表达式的列表。

大多数差异是在文本行上完成的，但是可以一个字符一个字符地进行，并且difflib可以做到这一点。看看difflib中对象的^{}类。

各地的例子都使用了人性化的输出，但是diff是以一种更加紧凑、计算机友好的方式在内部管理的。此外，diff通常包含冗余信息（如要删除的行的文本），以确保修补和合并更改的安全。如果您对此感到满意，可以通过自己的代码消除冗余。

我刚刚读到difflib选择了最优性，这是我不会反对的。有well known算法可以快速生成最小的更改集。

我曾经用大约1250行Java（JRCS）编写了一个通用的diffing引擎和一个最佳算法。它适用于任何可以比较以获得相等的元素序列。如果您想构建自己的解决方案，我认为JRCS的翻译/重新实现应该不需要超过300行Python。

处理由difflib生成的输出以使其更紧凑也是一个选项。这是一个包含三个更改（添加、更改和删除）的小文件的示例：

---  
+++  
@@ -7,0 +7,1 @@
+aaaaa
@@ -9,1 +10,1 @@
-c= 0
+c= 1
@@ -15,1 +16,0 @@
-    m = re.match(code_re, text)

补丁所说的可以很容易地浓缩为：

+7,1 
aaaaa
-9,1 
+10,1
c= 1
-15,1

对于您自己的示例，压缩输出将是：

-8,1
+9,1
print "The end"

为了安全起见，为必须插入的行保留前导标记（“>；”）可能是一个好主意。

-8,1
+9,1
>print "The end"

离你需要的更近了吗？

这是一个简单的压缩函数。您必须编写自己的代码才能以这种格式应用修补程序，但这应该很简单。

def compact_a_unidiff(s):
    s = [l for l in s if l[0] in ('+','@')]
    result = []
    for l in s:
        if l.startswith('++'):
            continue
        elif l.startswith('+'):
            result.append('>'+ l[1:])
        else:
            del_cmd, add_cmd = l[3:-3].split()
            del_pair, add_pair = (c.split(',') for c in (del_cmd,add_cmd))
            if del_pair[1]  != '0':
                result.append(del_cmd)
            if add_pair[1] != '0':
                result.append(add_cmd)
    return result

网友

2楼 · 编辑于 2024-06-06 08:43:01

如果只需要更改，则需要使用unified或context diff。你看到的是更大的文件，因为它包含了它们的共同点。

返回生成器的好处是，不需要立即将整个对象保存在内存中。这对于散布非常大的文件很有用。

网友

3楼 · 编辑于 2024-06-06 08:43:01

I'm also still trying to figure out why many difflib functions return a generator instead of a list, what's the advantage there?

好吧，再考虑一下——如果你比较文件，理论上（实际上）这些文件可能会很大——例如，将delta作为一个列表返回，就意味着将完整的数据读入内存，这不是一件明智的事情。

至于只返回差异，那么，使用生成器还有另一个优势——只需遍历delta并保留您感兴趣的任何行。

如果您阅读了不同样式delta的difflib documentation，您将看到一段：

Each line of a Differ delta begins with a two-letter code:
Code    Meaning
'- '    line unique to sequence 1
'+ '    line unique to sequence 2
'  '    line common to both sequences
'? '    line not present in either input sequence

因此，如果您只需要差异，那么可以使用str.startswith轻松地筛选出这些差异

您还可以使用difflib.context_diff获得一个紧凑的delta，它只显示更改。

相关问题更多 >

编程相关推荐

热门问题

热门文章