比较2个CSV文件

2024-04-27 01:10:05 发布

您现在位置:Python中文网/ 问答频道 /正文

我很难比较两个CSV文件和打印出一个单独的报告。我希望我的脚本首先匹配两个文件上的id,然后比较行的其余部分并打印出一个单独的报告来显示差异。我的脚本比较了两个文件并打印出不同的结果,但是如果新文件有额外的行,它将不起作用。你知道吗

两个文件的示例:

旧文件

ID  fname   lname   status
1   joe pol active
2   peters  dol active
3   john    nol active
4   mike    sol active

新文件

ID  fname   lname   status
1   joe pol active
2   peter   dol active
67  ryan    olson   stop
3   johnny  nolly   stop 
4   mike    sol active

代码:

import csv

orig = open('OLD.csv','r')
new = open('NEW.csv','r')

Change = set(new) - set(orig)

print(Change)

with open('OLD.csv', mode='r') as infile:
    reader = csv.reader(infile)
    with open('different.csv', 'w') as file_out:
        for line in Change:
            file_out.write(line)

orig.close()
new.close()
file_out.close()

Tags: 文件csv脚本idnewclose报告open
1条回答
网友
1楼 · 发布于 2024-04-27 01:10:05

由于CSV文件需要逗号分隔,我假设您的文件可以采用以下格式:

旧.csv:

ID,fname,lname,status
1,joe,pol,active
2,peters,dol,active
3,john,nol,active
4,mike,sol,active

新建.csv:

ID,fname,lname,status
1,joe,pol,active
2,peter,dol,active
67,ryan,olson,stop
3,johnny,nolly,stop
4,mike,sol,active

然后可以使用以下代码将它们转换为报表:

from csv import reader


# Creates a row dictionary from file
def get_row_map(filename):
    row_map = {}

    with open(filename) as file:
        csv_reader = reader(file)
        _, *headers = next(csv_reader)

        # map ids to rows
        for row in csv_reader:
            idx, *rest = row
            row_map[int(idx)] = dict(zip(headers, rest))

    return row_map


old_row_map = get_row_map("old.csv")
new_row_map = get_row_map("new.csv")

with open("different.txt", "w") as out:

    # Only loop over matched ids
    for row_id in old_row_map.keys() & new_row_map.keys():

        # only proceed if rows are not exactly the same
        if old_row_map[row_id] != new_row_map[row_id]:

            # convert to sets
            old_set, new_set = (
                set(old_row_map[row_id].items()),
                set(new_row_map[row_id].items()),
            )

            # get differences between old and new sets
            old_diff = dict(list(old_set - new_set))
            new_diff = dict(list(new_set - old_set))

            # write out report
            out.write("ID: %d\n" % row_id)
            for key in old_diff:
                out.write(
                    "%s -> old: %s, new: %s\n" % (key, old_diff[key], new_diff[key])
                )

输出以下差异.txt:

ID: 2
fname -> old: peters, new: peter
ID: 3
fname -> old: john, new: johnny
lname -> old: nol, new: nolly
status -> old: active, new: stop

相关问题 更多 >