修补文本文件
我正在尝试逐步构建一个包含差异补丁的文本文件。最开始我有一个空的文本文件,我需要应用600多个补丁,最终得到我写的文档(我用mercurial跟踪了更改)。每次文件中的更改都需要添加额外的信息,所以我不能简单地在命令行中使用diff和patch。
我花了一整天时间编写(并重写)一个工具,来解析这些差异文件并相应地对文本文件进行更改,但其中一个差异文件让我程序的表现变得很奇怪,我完全搞不懂。
这个函数会对每个差异文件进行调用:
# filename = name of the diff file
# date = extra information to be added as a prefix to each added line
def process_diff(filename, date):
# that's the file all the patches will be applied to
merge_file = open("thesis_merged.txt", "r")
# map its content to a list to manipulate it in memory
merge_file_lines = []
for line in merge_file:
line = line.rstrip()
merge_file_lines.append(line)
merge_file.close()
# open for writing:
merge_file = open("thesis_merged.txt", "w")
# that's the diff file, containing all the changes
diff_file = open(filename, "r")
print "-", filename, "-" * 20
# also map it to a list
diff_file_lines = []
for line in diff_file:
line = line.rstrip()
if not line.startswith("\\ No newline at end of file"): # useless information ... or not?
diff_file_lines.append(line)
# ignore header:
#--- thesis_words_0.txt 2010-12-04 18:16:26.020000000 +0100
#+++ thesis_words_1.txt 2010-12-04 18:16:26.197000000 +0100
diff_file_lines = diff_file_lines[2:]
hunks = []
for i, line in enumerate(diff_file_lines):
if line.startswith("@@"):
hunks.append( get_hunk(diff_file_lines, i) )
for hunk in hunks:
head = hunk[0]
# @@ -252,10 +251,9 @@
tmp = head[3:-3].split(" ") # [-252,10] [+251,9]
line_nr_minus = tmp[0].split(",")[0]
line_nr_minus = int(line_nr_minus[1:]) # 252
line_nr_plus = tmp[1].split(",")[0]
line_nr_plus = int(line_nr_plus[1:]) # 251
for j, line in enumerate(hunk[1:]):
if line.startswith("-"):
# delete line from the file in memory
del merge_file_lines[line_nr_minus-1]
plus_counter = 0 # counts the number of added lines
for k, line in enumerate(hunk[1:]):
if line.startswith("+"):
# insert line, one after another
merge_file_lines.insert((line_nr_plus-1)+plus_counter, line[1:])
plus_counter += 1
for line in merge_file_lines:
# write the updated file back to the disk
merge_file.write(line.rstrip() + "\n")
merge_file.close()
diff_file.close()
print "\n\n"
def get_hunk(lines, i):
hunk = []
hunk.append(lines[i])
# @@ -252,10 +251,9 @@
lines = lines[i+1:]
for line in lines:
if line.startswith("@@"):
# next hunk begins, so stop here
break
else:
hunk.append(line)
return hunk
这些差异文件看起来是这样的——这里就是问题所在:
--- thesis_words_12.txt 2011-01-17 20:35:50.804000000 +0100
+++ thesis_words_13.txt 2011-01-17 20:35:51.057000000 +0100
@@ -245 +245,2 @@
-As
+Per
+definition
@@ -248,3 +249 @@
-already
-proposes,
-"generative"
+generative
@@ -252,10 +251,9 @@
-that
-something
-is
-created
-based
-on
-a
-set
-of
-rules.
+"having
+the
+ability
+to
+originate,
+produce,
+or
+procreate."
+<http://www.thefreedictionary.com/generative>
输出结果:
[...]
Per
definition
the
"generative"
generative
means
"having
the
ability
to
originate,
produce,
or
procreate."
<http://www.thefreedictionary.com/generative>
that
[...]
之前的所有补丁都能如预期那样重现文本。我已经重写了很多次,但那个有问题的行为依然存在——所以现在我完全没有头绪。
如果能给我一些不同的做法的提示和建议,我将非常感激。提前谢谢大家!
编辑:
- 最终每一行应该看起来像这样:{文本更改的日期和时间}单词
这基本上是为了记录一个单词被添加到文本中的日期和时间。
2 个回答
0
试试使用来自 python-patch 的解析器——这样你就可以手动一个一个地应用补丁,看看哪个会出问题。虽然它的接口不太稳定,但解析器是稳定的,所以你可以直接把 trunk/ 里的 patch.py 复制到你的项目中。不过,如果能有一些关于想要的接口的建议就更好了。
1
代码确实有个错误——我没有正确理解差异文件(diff文件)。我没意识到当一个差异文件里有多个部分时,需要换行。
def process_diff(filename, date, step_nr):
merge_file = open("thesis_merged.txt", "r")
merge_file_lines = [line.rstrip() for line in merge_file]
merge_file.close()
diff_file = open(filename, "r")
print "-", filename, "-"*2, step_nr, "-"*2, date
diff_file_lines = [line.rstrip() for line in diff_file]
hunks = []
for i, line in enumerate(diff_file_lines):
if line.startswith("@@"):
hunks.append( get_hunk(diff_file_lines, i) )
diff_file.close()
line_shift = 0
for hunk in hunks:
head = hunk[0]
# @@ -252,10 +251,9 @@
tmp = head[3:-3].split(" ") # [-252,10] [+251,9]
line_nr_minus = tmp[0].split(",")[0]
minusses = 1
if len( tmp[0].split(",") ) > 1:
minusses = int( tmp[0].split(",")[1] )
line_nr_minus = int(line_nr_minus[1:]) # 252
line_nr_plus = tmp[1].split(",")[0]
plusses = 1
if len( tmp[1].split(",") ) > 1:
plusses = int( tmp[1].split(",")[1] )
line_nr_plus = int(line_nr_plus[1:]) # 251
line_nr_minus += line_shift
#@@ -248,3 +249 @@
#-already
#-proposes,
#-"generative"
#+generative
if hunk[1]: # -
for line in hunk[1]:
del merge_file_lines[line_nr_minus-1]
plus_counter = 0
if hunk[2]: # +
for line in hunk[2]:
prefix = ""
if len(line) > 1:
prefix = "{" + date + "}"
merge_file_lines.insert((line_nr_plus-1)+plus_counter, prefix + line[1:])
plus_counter += 1
line_shift += plusses - minusses