如何在Python列表中最有效地计数元素

6 投票

6 回答

11143 浏览

提问于 2025-04-16 08:36

这个问题几乎和这里的内容一样，只不过我想问的是如何得到一个排序结果的最有效的方法。

我有一个列表（大约有10个随机整数，范围在0到12之间），比如：

the_list = [5, 7, 6, 5, 5, 4, 4, 7, 5, 4]

我想创建一个函数，这个函数返回一个元组列表（项目，计数），并按照第一个元素排序，比如：

output = [(4, 3), (5, 4), (6, 1), (7, 2)]

到目前为止，我使用了：

def dupli(the_list):
    return [(item, the_list.count(item)) for item in sorted(set(the_list))]

但是我调用这个函数几乎有一百万次，我需要尽可能让它快（用Python）。所以我的问题是：如何让这个函数更省时间？（内存方面呢？）

我试着玩了一下，但没有找到明显的解决办法：

from timeit import Timer as T
number=10000
setup = "the_list=[5, 7, 6, 5, 5, 4, 4, 7, 5, 4]"

stmt = "[(item, the_list.count(item)) for item in sorted(set(the_list))]"
T(stmt=stmt, setup=setup).timeit(number=number)

Out[230]: 0.058799982070922852

stmt = "L = []; \nfor item in sorted(set(the_list)): \n    L.append((item, the_list.count(item)))"
T(stmt=stmt, setup=setup).timeit(number=number)

Out[233]: 0.065041065216064453

stmt = "[(item, the_list.count(item)) for item in set(sorted(the_list))]"
T(stmt=stmt, setup=setup).timeit(number=number)

Out[236]: 0.098351955413818359

谢谢
Christophe

性能优化数据结构元组列表内存管理排序算法计数

6 个回答

利用“在0到12之间”的条件：

>>> the_list = [5, 7, 6, 5, 5, 4, 4, 7, 5, 4]
>>> answer1 = [0] * 13
>>> for i in the_list:
...    answer1[i] += 1
...
>>> answer1
[0, 0, 0, 0, 3, 4, 1, 2, 0, 0, 0, 0, 0]
>>> # You might be able to use that as-is:
...
>>> for i, v in enumerate(answer1):
...     if v: print i, v
...
4 3
5 4
6 1
7 2
>>> # Otherwise you can build the list that you specified:
...
>>> answer2 = [(i, v) for i, v in enumerate(answer1) if v]
>>> answer2
[(4, 3), (5, 4), (6, 1), (7, 2)]
>>>

回答于 2025-04-16 由 Python大师

分享举报

我会试试：

from collections import defaultdict
output = defaultdict(lambda: 0)
for item in the_list: output[item] += 1
return sorted(output.items())

回答于 2025-04-16 由 Python大师

分享举报

改变排序的位置，可以节省大约20%的时间。

把这个：

def dupli(the_list):
    return [(item, the_list.count(item)) for item in sorted(set(the_list))]

改成这个：

def dupli(the_list):
    count = the_list.count # this optimization added courtesy of Sven's comment
    result = [(item, count(item)) for item in set(the_list)]
    result.sort()
    return result

这样做更快的原因是，sorted这个方法需要先创建一个临时的列表，而直接在原地排序就不需要了。

补充： 这里还有一种方法，比你原来的快35%：

def dupli(the_list):
    counts = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    for n in the_list:
        counts[n] += 1
    return [(i, counts[i]) for i in (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) if counts[i]]

注意：你可能想要随机化the_list里的值。我的最终版本的dupli在其他随机数据集上测试时速度更快（import random; the_list=[random.randint(0,12) for i in xrange(10)]）

回答于 2025-04-16 由 Python大师

分享举报

如何在Python列表中最有效地计数元素

6 个回答

撰写回答