使用Python比较两个文件的差异

2024-03-29 15:27:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我想比较两个文件(从第一个文件中取一行,然后在整个第二个文件中查找),看看它们之间的区别,并从中写入缺失的行文件a.txt结束文件B.txt. 我是python新手,所以第一次我想到了一个简单的程序:

import difflib

file1 = "fileA.txt"
file2 = "fileB.txt"

diff = difflib.ndiff(open(file1).readlines(),open(file2).readlines())
print ''.join(diff),

但结果我得到了两个文件的组合,每行都有合适的标记。我知道我可以查找以标记“-”开头的行,然后将其写入文件末尾文件B.txt,但对于大文件(约100 MB),此方法效率低下。有人能帮我改进一下程序吗?在

文件结构如下:

输入:

在文件a.txt在

^{pr2}$

在文件B.txt在

^{3}$

输出:

文件B_后.txt在

Oct  9 12:19:16 user sshd[12744]: Accepted password for root from 213.XXX.XXX.XX7 port 60554 ssh2
Oct  9 12:19:16 user sshd[12744]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct  9 13:24:42 user sshd[12744]: Received disconnect from 213.XXX.XXX.XX7: 11: disconnected by user
Oct  9 13:24:42 user sshd[12744]: pam_unix(sshd:session): session closed for user root
Oct  9 13:25:31 user sshd[12844]: Accepted password for root from 213.XXX.XXX.XX7 port 33254 ssh2
Oct  9 13:25:31 user sshd[12844]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct  9 13:35:48 user sshd[12868]: Accepted password for root from 213.XXX.XXX.XX7 port 33574 ssh2
Oct  9 13:35:48 user sshd[12868]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct  9 13:46:58 user sshd[12844]: Received disconnect from 213.XXX.XXX.XX7: 11: disconnected by user
Oct  9 13:46:58 user sshd[12844]: pam_unix(sshd:session): session closed for user root
Oct  9 15:47:58 user sshd[12868]: pam_unix(sshd:session): session closed for user root
Oct 11 22:17:31 user sshd[2655]: Accepted password for root from 17X.XXX.XXX.X19 port 5567 ssh2
Oct 11 22:17:31 user sshd[2655]: pam_unix(sshd:session): session opened for user root by (uid=0)

Tags: 文件fromtxtforbysessionunixroot
2条回答

请在bash中尝试此操作:

cat fileA.txt fileB.txt | sort -M | uniq > new_file.txt

sort -M: 根据初始字符串排序,字符串由任意数量的空格组成,后跟 由一个月名缩写,折成大写比较 顺序为“JAN”<;“FEB”<。。。<;'DEC'。无效名称比较 低到有效的名字。“LC\u TIME”区域设置确定月份 拼写。在

uniq:过滤掉文件中的重复行。在

|:将一个命令的输出传递给另一个命令以进行进一步处理。在

它要做的是获取这两个文件,按照上面描述的方式对它们进行排序,保留唯一的项并将它们存储在new_file.txt

注意:这不是一个python解决方案,但是您用linux标记了这个问题,所以我想您可能会感兴趣。您还可以找到有关使用的命令here的更多详细信息。在

读入两个文件并转换为set

求两个集合的并集
根据时间对并集排序
用新行将集合连接到字符串

import datetime
import 
file1 = "fileA.txt"
file2 = "fileB.txt"

with open(file1 ,'rb') as f:
  sa = set( line for line in f )
with open(file2 ,'rb') as f:
  sb = set( line for line in f )
print '\n'.join( sorted( sa.union(sb), key = lambda x: datetime.datetime.strptime( ' '.join( x.split()[:3]), '%b %d %H:%M:%S' )) )



Oct  9 12:19:16 user sshd[12744]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct  9 12:19:16 user sshd[12744]: Accepted password for root from 213.XXX.XXX.XX7 port 60554 ssh2
Oct  9 13:24:42 user sshd[12744]: pam_unix(sshd:session): session closed for user root
Oct  9 13:24:42 user sshd[12744]: Received disconnect from 213.XXX.XXX.XX7: 11: disconnected by user
Oct  9 13:25:31 user sshd[12844]: Accepted password for root from 213.XXX.XXX.XX7 port 33254 ssh2
Oct  9 13:25:31 user sshd[12844]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct  9 13:35:48 user sshd[12868]: Accepted password for root from 213.XXX.XXX.XX7 port 33574 ssh2
Oct  9 13:35:48 user sshd[12868]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct  9 13:46:58 user sshd[12844]: pam_unix(sshd:session): session closed for user root
Oct  9 13:46:58 user sshd[12844]: Received disconnect from 213.XXX.XXX.XX7: 11: disconnected by user
Oct  9 15:47:58 user sshd[12868]: pam_unix(sshd:session): session closed for user root
Oct 11 22:17:31 user sshd[2655]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct 11 22:17:31 user sshd[2655]: Accepted password for root from 17X.XXX.XXX.X19 port 5567 ssh2

相关问题 更多 >