如何从csv文件中删除重复排列?

2024-04-19 05:56:57 发布

您现在位置:Python中文网/ 问答频道 /正文

我试着从一个大的csv文件中提取3列并找到排列,以便只保留唯一的三元组并将其放入另一个csv文件中。你知道吗

例如,如果我有:

[8,9,15]
[78,35,98]
[90,35,56]
[64,89,98]
[15,8,9]...etc

必须发现第一个三胞胎和第五个三胞胎是一样的,并且只保留其中一个。我写了以下内容,但不起作用。你知道吗

 import csv
 reader=csv.reader(open('file1.csv','r'), delimiter = ',')
 writer=csv.writer(open('mynew.csv', 'w'), delimiter=',')
 myset = set()
 for row in reader:
    if row[0] not in myset:
       writer.writerow(row)
    if row[1] not in myset:
       writer.writerow(row)
    if row[2] not in myset:
       writer.writerow(row)

Tags: 文件csvinifetcnotopenreader
1条回答
网友
1楼 · 发布于 2024-04-19 05:56:57

试试这个:

#!/usr/bin/env python
import csv
reader=csv.reader(open('file1.csv','r'), delimiter = ',')
writer=csv.writer(open('mynew.csv', 'w'), delimiter=',')
myset = set()
for row in reader:
    print "adding %s" % row
    # a frozen set is hashable and can be inserted to a set
    # this assumes no duplicates exist within the row like 1,1,2,3,4 (two 1's)
    # (otherwise you'll have to hash the row yourself)
    myset.add(frozenset(row))
    print "set size: %d" % len(myset)

print myset

相关问题 更多 >