Python：从字典中快速移除大量键的最佳策略

2 投票

4 回答

4649 浏览

提问于 2025-04-18 00:01

我知道Python中的字典其实是一个哈希表。当字典的大小超过当前最大容量的2/3时，它会自动调整大小（具体可以参考这个链接：Objects\dictnotes.txt）。

我需要删除很多字典里的项目（几千个），比如每小时删除一次，删除的标准很简单——如果键小于等于某个条件，就删除。

我知道可以用字典推导式来创建新的字典，也知道在遍历字典时会调整字典的大小。

# dict comprehension
new_d = {key: value for key, value in d.iteritems() if key >= guard_condition }

# resize while iterating
for key in d:
    if key < guard_condition:
        del d[key]

还有其他方法可以实现这个目的吗？哪种方法更快呢？

性能优化数据结构字典哈希表条件过滤自动调整大小字典推导式项目删除

4 个回答

-1

从性能的角度来看，无论是在Ruby还是Python中，使用while循环的速度总是比列表推导式快。但因为你是在用Python编程，所以你可能更倾向于使用列表推导式，因为如果速度是最重要的，你就不会选择用Python了。

回答于 2025-04-18 由 Python大师

分享举报

接着我上面的评论，建议使用 LRU（最近最少使用）或 LFU（最不常使用）缓存：

http://en.wikipedia.org/wiki/Cache_algorithms

当有新数据进来的时候，可以根据合适的策略来决定哪些数据需要被删除。这样可以把删除数据的成本分摊到整个应用的使用过程中，而不是在某些时候集中删除，这样会更高效。

我觉得用 del[key] 从字典中删除数据的速度没有更快的方法，但有很多更好的方式可以实现你想要的效果（我猜测你是想这样做）。LRU 和 LFU 是非常流行且常用的解决方案。

回答于 2025-04-18 由 Python大师

分享举报

我试过用IPython，结果是这样的：

In [140]: d = {item: item for item in xrange(10000)};

In [142]: guard_condition = 9000;

In [144]: %timeit new_d = {key: value for key, value in d.iteritems() if key >=
100 loops, best of 3: 2.54 ms per loop

In [140]: d = {item: item for item in xrange(10000)};

In [149]: def del_iter(d, guard_condition):
   .....:     for key in d.keys():
   .....:         if key < guard_condition:
   .....:             del d[key]
   .....:

In [150]: %timeit del_iter(d, guard_condition)
1000 loops, best of 3: 232 us per loop

差别大约是100次循环 * 2.54毫秒 = 254000微秒，和1000次循环 * 232微秒 = 232000微秒，这对我来说几乎可以忽略不计。

我会使用字典推导，因为这样更容易阅读。

我觉得执行时间非常简单，我同意@Hyperboreus说的关于过早优化的问题。

回答于 2025-04-18 由 Python大师

分享举报

这要看你的字典有多大，以及你要删除多少元素。如果你删除的字典键少于80%，那么“边遍历边调整大小”会比“字典推导式”快。如果你删除的字典键超过80%，那么“字典推导式”会更快。你可以自己试试下面的代码。

import cProfile, pstats, StringIO
pr = cProfile.Profile()
pr.enable()

guard_condition = int(raw_input("Enter guard_condition: "))

d = {item: item for item in xrange(10000000)};

new_d = {key: value for key, value in d.iteritems() if key >= guard_condition }

def del_iter(d, guard_condition):
    for key in d.keys():
        if key < guard_condition:
            del d[key]

del_iter(d, guard_condition)

pr.disable()
s = StringIO.StringIO()
sortby = 'cumulative'
ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
ps.print_stats()
print s.getvalue()

当guard_condition设置为7000000时，输出结果是

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    2.794    2.794    2.794    2.794 {raw_input}
     1    1.263    1.263    1.263    1.263 dictDel1.py:7(<dictcomp>)
     1    1.030    1.030    1.030    1.030 dictDel1.py:9(<dictcomp>) <-- dict comprehension
     1    0.892    0.892    0.976    0.976 dictDel1.py:11(del_iter) <-- resize while iterating
     1    0.085    0.085    0.085    0.085 {method 'keys' of 'dict' objects}
     1    0.000    0.000    0.000    0.000 {method 'iteritems' of 'dict' objects}
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

当guard_condition设置为8500000时，输出结果是

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    3.316    3.316    3.316    3.316 {raw_input}
     1    1.247    1.247    1.247    1.247 dictDel1.py:7(<dictcomp>)
     1    0.937    0.937    1.052    1.052 dictDel1.py:11(del_iter) <-- resize while iterating
     1    0.787    0.787    0.787    0.787 dictDel1.py:9(<dictcomp>) <-- dict comprehension
     1    0.115    0.115    0.115    0.115 {method 'keys' of 'dict' objects}
     1    0.000    0.000    0.000    0.000 {method 'iteritems' of 'dict' objects}
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

回答于 2025-04-18 由 Python大师

分享举报

Python：从字典中快速移除大量键的最佳策略

4 个回答

撰写回答