尝试比较两个csv文件并将差异写入outpu

2024-04-26 22:41:54 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在开发一个脚本,它将两个csv文件之间的差异作为输出,并生成一个新的csv文件作为输出,但前提是两个输入文件之间相同的2行(指行号)包含不同的数据,例如文件1中的第3行有“mike”,“basketball player”,文件2中的第3行有“mike”,“basketball player”。输出csv将获取这些打印它们并将它们写入csv。它是可行的,但也有一些问题(我知道这个问题以前也被问过好几次,但其他人对我的做法不同,因为我对编程相当陌生,所以我不太了解他们的代码)。

新csv文件中的输出在每个单元格中都有输出的每个字母(见下图),我相信它与分隔符/quotechar/引号第37行有关。我希望他们在自己的牢房里没有句号,多个空格,逗号或“|”。

另一个问题是运行需要很长时间。我正在处理多达50000行的数据集,运行它可能需要一个多小时。为什么会这样?有什么建议可以帮助加快速度?在for循环之外放点东西?我之前确实尝试过difflib方法,但我只能打印整个“input_file1”,但无法将该文件与另一个文件进行比较。

# aim of script is to compare csv files and output difference as a new csv

# import necessary libraries
import csv

# File1 = open(raw_input("path:"),"r") #filename, mode
# File2 = open(raw_input("path:"),"r") #filename, mode

# selects the 2 input files to be compared
input_file1 = "G:/savestuffhereqwerty/electorate_meshblocks/teststuff/Book1.csv"
input_file2 = "G:/savestuffhereqwerty/electorate_meshblocks/teststuff/Book2.csv"
# creates the blank output csv file
output_path = "G:/savestuffhereqwerty/electorate_meshblocks/outputs/output2.csv"
a = open(input_file1, "r")
output_file = open(output_path,"w")
output_file.close()
count = 0

with open(input_file1) as fp1:


    for row_number1, row_value1 in enumerate(fp1):
        if row_number1 == count:
            print "got to 1st point"
            value1 = row_value1

            with open(input_file2) as fp2:
                for row_number2, row_value2 in enumerate(fp2):
                    if row_number2 == count:
                        print "got to 2nd point"
                        value2 = row_value2

                        if value1 == value2:
                            print value1, value2
                        else:
                            print value1, value2
                            with open(output_path, 'wb') as f:
                                writer = csv.writer(f, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
                                # testing to see if the code writes text to the csv
                                writer.writerow(["test1"])
                                writer.writerow(["test2", "test3", "test4"])
                                writer.writerows([value1, value2])
                                print "code reached writing stage"
        count += 1
        print count
print "done"
# replace(",",".")

SCREENSHOT OF PRINTED OUTPUTSCREENSHOT OF OUTPUT CSV


Tags: 文件csvtopathinputoutputascount
1条回答
网友
1楼 · 发布于 2024-04-26 22:41:54

因为您想逐行比较两个文件,所以不应该在第一个文件中的每一行遍历第二个文件。您只需^{}两个csv读取器并筛选行:

input_file1 = "foo"
input_file2 = "bar"
output_path = "baz"

with open(input_file1) as fin1:
  with open(input_file2) as fin2:
    read1 = csv.reader(fin1)
    read2 = csv.reader(fin2)
    diff_rows = (row1 for row1, row2 in zip(read1, read2) if row1 != row2)
    with open(output_path, 'w') as fout:
      writer = csv.writer(fout)
      writer.writerows(diff_rows)

此解决方案假定两个文件具有相同的行数。在

相关问题 更多 >