如何在python中删除CSV行

2024-03-28 23:55:13 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图比较两个csv文件(fileA和fileB),并从fileA中删除fileB中未找到的任何行。我想不需要创建第三个文件就可以做到这一点。我想我可以使用csv编写器模块来完成这项工作,但现在我自己在猜测。

目前,我正在使用以下代码从文件B中记录比较数据:

removal_list = set()
with open('fileB', 'rb') as file_b:
    reader1 = csv.reader(file_b)
    next(reader1)
    for row in reader1:
        removal_list.add((row[0], row[2]))

这就是我陷入困境的地方,不知道如何删除行:

with open('fileA', 'ab') as file_a:
    with open('fileB', 'rb') as file_b:
        writer = csv.writer(file_a)
            reader2 = csv.reader(file_b)
            next(reader2)
            for row in reader2:
                if (row[0], row[2]) not in removal_list:
                # If row was not present in file B, Delete it from file A.
                #stuck here:  writer.<HowDoIRemoveRow>(row)

Tags: 文件csvinaswithopenlistfile
3条回答

此解决方案将^{}inplace=True一起使用,后者将写入临时文件,然后在最后自动将其重命名为您的文件名。不能从文件中删除行,但只能用所需的行重写。

if the keyword argument inplace=1 is passed to fileinput.input() or to the FileInput constructor, the file is moved to a backup file and standard output is directed to the input file (if a file of the same name as the backup file already exists, it will be replaced silently). This makes it possible to write a filter that rewrites its input file in place.

里脊

h1,h2,h3
a,b,c
d,e,f
g,h,i
j,k,l

文件B

h1,h2,h3
a,b,c
1,2,3
g,h,i
4,5,6

import fileinput, sys, csv

with open('fileB', 'rb') as file_b:
    r = csv.reader(file_b)
    next(r) #skip header
    seen = {(row[0], row[2]) for row in r}

f = fileinput.input('fileA', inplace=True) # sys.stdout is redirected to the file
print next(f), # write header as first line

w = csv.writer(sys.stdout) 
for row in csv.reader(f):
   if (row[0], row[2]) in seen: # write it if it's in B
       w.writerow(row)

里脊

h1,h2,h3
a,b,c    
g,h,i

正如Lennart所描述的,在遍历CSV文件时,不能就地修改它。

如果您真的反对创建第三个文件,那么您可能需要研究使用带StringIO的字符串缓冲区,其思想是在内存中构建文件a的新的所需内容。在脚本结束时,可以在文件A上写入缓冲区的内容

from cStringIO import StringIO


with open('fileB', 'rb') as file_b:
    new_a_buf = StringIO()
    writer = csv.writer(new_a_buf)
    reader2 = csv.reader(file_b)
    next(reader2)
    for row in reader2:
        if (row[0], row[2]) not in removal_list:
            writer.writerow(row)

# At this point, the contents (new_a_buf) exist in memory
with open('fileA', 'wb') as file_a:
    file_a.write(new_a_buf.getvalue())

CSV不是数据库格式。它是作为一个整体来读和写的。不能删除中间的行。因此,在不创建第三个文件的情况下执行此操作的唯一方法是在内存中完整地读取该文件,然后将其写出来,而不创建有问题的行。

但一般来说,最好使用第三个文件。

相关问题 更多 >