在python中查找多个重复的文本,计算重复的数量和unique以及第一个重复文本的索引

2024-05-13 23:05:22 发布

您现在位置:Python中文网/ 问答频道 /正文

请帮帮我。 我的文件如下所示:

This is a cat
we are working at BusinessBrio
Gitu is my beloved cat
Jery is also a cat
Boni is a nice dog
Gitu is my beloved cat
we are working at BusinessBrio
This is a cat
we are working at BusinessBrio
Gitu is my beloved cat
Jery is also a cat
Boni is a nice dog
Gitu is my beloved cat
we are working at BusinessBrio

我需要这样的输出:

[[1,'we are working at BusinessBrio',4],[2,'Gitu is my beloved cat',4],[0,'This is a cat',2],[3,'Jery is also a cat',2],[4,'Boni is a nice dog',2]]

更多的输出必须根据重复计数按降序排序


Tags: ismythisareatcatworkingalso
2条回答

使用Countersorted函数

from collections import Counter

with open("hel.txt","r") as f:
    b=f.read().splitlines()  

counter=Counter(b)

output=[]

for key, value in counter.iteritems():
    lst=[]
    lst.append(b.index(key))
    lst.append(key)
    lst.append(value)
    output.append(lst)

out=sorted(output,key=lambda x:x[2],reverse=True)
print out

输出:

[[1, 'we are working at BusinessBrio', 4], [2, 'Gitu is my beloved cat', 4], [0, 'This is a cat', 2], [4, 'Boni is a nice dog', 2], [3, 'Jery is also a cat', 2]]
It is not clear how to separate sentences since there is no punctuation. But suppose we know how to. Then just use Counter from collection.

data = '''
This is a cat
we are working at BusinessBrio
Gitu is my beloved cat
Jery is also a cat
Boni is a nice dog
Gitu is my beloved cat
we are working at BusinessBrio
This is a cat
we are working at BusinessBrio
Gitu is my beloved cat
Jery is also a cat
Boni is a nice dog
Gitu is my beloved cat
we are working at BusinessBrio
 '''
li = data.split('\n')

from collections import Counter

pp(Counter(li))

Counter({'we are working at BusinessBrio': 4,
         'Gitu is my beloved cat': 4,
         'Boni is a nice dog': 2,
         'This is a cat': 2,
         'Jery is also a cat': 2,
         '': 1,
         ' ': 1})

相关问题 更多 >