Python - 显示CSV文件中重复值的行

7 投票

4 回答

23001 浏览

提问于 2025-04-18 12:56

我有一个.csv文件，里面有好几列，其中一列是随机数字，我想找出重复的数字。如果有重复的情况——虽然这有点奇怪，但我就是想检查这个——我想显示或保存那些数字所在的整行数据。

为了让你更明白，我的文件大概是这样的：

第一行, 无论什么, 230, 无论什么, 等等
第二行, 无论什么, 11, 无论什么, 等等
第三行, 无论什么, 46, 无论什么, 等等
第四行, 无论什么, 18, 无论什么, 等等
第五行, 无论什么, 14, 无论什么, 等等
第六行, 无论什么, 48, 无论什么, 等等
第七行, 无论什么, 91, 无论什么, 等等
第八行, 无论什么, 18, 无论什么, 等等
第九行, 无论什么, 67, 无论什么, 等等

我想得到的结果是：

第四行, 无论什么, 18, 无论什么, 等等
第八行, 无论什么, 18, 无论什么, 等等

为了找出重复的数字，我把那一列存到一个字典里，然后统计每个数字出现的次数。

import csv
from collections import Counter, defaultdict, OrderedDict

with open(file, 'rt') as inputfile:
        data = csv.reader(inputfile)

        seen = defaultdict(set)
        counts = Counter(row[col_2] for row in data)

print "Numbers and times they appear: %s" % counts

然后我看到的结果是：

Counter({' 18 ': 2, ' 46 ': 1, ' 67 ': 1, ' 48 ': 1,...})

现在的问题是，我无法把数字和它的重复次数联系起来，之后也无法计算。如果我这样做：

for value in counts:
        if counts > 1:
            print counts

我只会得到数字本身，这不是我想要的，而且我还想打印出整行数据……

简单来说，我在寻找一种方法来实现：

If there's a repeated number:
        print rows containing those number
else
        print "No repetitions"

谢谢大家。

数据处理字典数据分析 csv 行数据计数器重复值数字统计

4 个回答

你可以很简单地用 pandas 来找出重复的行：

import pandas
df = pandas.read_csv(csv_file, names=fields, index_col=False)
df = df[df.duplicated([column_name], keep=False)]
df.to_csv(csv_file2, index=False)

回答于 2025-04-18 由 Python大师

分享举报

我们可以简单地遍历这个文件两次：

第一次，记录每个第三列的值出现了多少次。
第二次，再遍历一遍，打印出那些第三列出现超过一次的行。

看看这个：

awk -F, 'FNR==NR{a[$3]++; next}
         {if (a[$3]>1) {print}}' file file

测试

$ awk -F, 'FNR==NR{a[$3]++; next} {if (a[$3]>1) {print}}' a a
Fourth, Whatever, 18, Whichever, etc
Eighth, Whatever, 18, Whichever, etc

回答于 2025-04-18 由 Python大师

分享举报

你应该像下面这样创建字典，这样重复的条目就不会互相覆盖：

if(dict.has_key(num) == 0):
     dict[num] = []
     dict[num].append(val)
else:
     dict[num].append(val)

然后，遍历字典中的每个列表值，如果某个键的值大于1，那就说明它出现了不止一次。

回答于 2025-04-18 由 Python大师

分享举报

试试这个，可能对你有用。

entries = []
duplicate_entries = []
with open('in.txt', 'r') as my_file:
    for line in my_file:
        columns = line.strip().split(',')
        if columns[2] not in entries:
            entries.append(columns[2])
        else:
            duplicate_entries.append(columns[2]) 

if len(duplicate_entries) > 0:
    with open('out.txt', 'w') as out_file:
        with open('in.txt', 'r') as my_file:
            for line in my_file:
                columns = line.strip().split(',')
                if columns[2] in duplicate_entries:
                    print line.strip()
                    out_file.write(line)
else:
    print "No repetitions"

回答于 2025-04-18 由 Python大师

分享举报

Python - 显示CSV文件中重复值的行

4 个回答

测试

撰写回答