删除textfile中的重复行,除非它包含“{”或“}”

2024-03-29 09:33:36 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个非常大的文本文件,内容如下:

@INBOOK{Ackermann1999-b, 
  author = {Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, 
        K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. 
        and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and 
        Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, 
        K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. 
        and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and 
        Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, 
        K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. 
        and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and 
        Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann}, 
  year = {1980}, 
  timestamp = {1995-12-02} 
}      

我想删除重复的行,除了这些包含括号{或}的行。 结果应该如下所示:

@INBOOK{Ackermann1999-b, 
  author = {Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, 
        Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann}, 
  year = {1980}, 
  timestamp = {1995-12-02} 
} 

多亏了Vinay Sajip,我才发现了这条Python:

lines_seen = set() # holds lines already seen 
outfile = open("literatur_clean.txt", "w") 
for line in open("literatur_dupl.txt", "r"): 
    if line not in lines_seen: # not a duplicate 
        outfile.write(line) 
        lines_seen.add(line) 
outfile.close() 

但它也会删除带有右括号的行}和具有相同authordata的行。 因此,我需要括号的条件。你知道吗

有人能告诉我添加这个条件吗?你知道吗

提前谢谢


Tags: andtxtlineopenyeartimestampoutfile括号