<p>对两个答案都有一点反馈;这两种方法都不准确,因为他们选择关键点的方式</p>
<p>他们有效地将字典视为一个列表,如果您的数据是以这种方式设置的,您可能不需要字典(而是一个排序列表),则保留n个项目</p>
<p>关于<strong>选项2</strong></p>
<p>我敢打赌您正在执行这样的查找:</p>
<p><code>new_dict = {k:v for k,v in parent_dict.items() if k in selections}</code></p>
<p>在这种情况下,性能随着<code>len(selections)</code>的增加而降低;某种程度上缓解这种情况的一种方法是使用集合而不是列表进行选择;有效地<code>O(m*n)</code>(其中m是父目录的大小,n是选择的数量)</p>
<p><em>因为n上的循环并不总是需要在完整的n上循环,所以它可能更接近<code>O((n**2-n)/2)</code>,其中<code>(n**2-2)/2</code>是1和n之间所有数字的总和</em></p>
<p>另一方面,<strong>选项1</strong>中删除的循环是<code>O(2m-n)</code></p>
<p><em><code>O(m)</code>用于复制dict和<code>O(m-n)</code>用于删除键</em></p>
<p>有一些非普通的(我想到的是lodash)解决方案是高度优化的,可能会更快,但严格地看时间复杂度,似乎在现实的关键点选择中删除应该更快</p>
<p>哪个选项最好可能取决于<code>m</code>&<code>n</code></p>
<hr/>
<p><strong>我刚刚经历了一个金发碧眼的时刻,但我想把我的无知留给未来的任何路人</strong></p>
<p>您应该这样做:
<code>new_dict = {k: parent_dict[k] for k in selections}</code></p>
<p>您不必为复制dict支付<code>O(m)</code>罚款,向dict添加项目是<code>O(1)</code>,在选择上循环是<code>O(n)</code>,因此您在<code>O(2n)</code>上坐得很好,即<code>O(n)</code>我认为没有比这更快的了</p>
<hr/>
<p>好的,下面是代码:</p>
<pre><code>import random
import timeit
m = 3_000
n0 = 300
n1 = 600
trials = 10_000
def remove(parent_dict, kept_keys):
result = parent_dict.copy()
for key in parent_dict.keys():
if key not in kept_keys:
del result[key]
return result
def add_dict_comprehension(parent_dict, kept_keys):
return {k:v for k,v in parent_dict.items() if k in kept_keys}
def add_key_comprehension(parent_dict, kept_keys):
return {k:parent_dict[k] for k in kept_keys}
def add_dict_comprehension_set(parent_dict, kept_keys):
return {k:v for k,v in parent_dict.items() if k in kept_keys}
def generate_test_data(m, n0, n1):
parent_dict = {}
for key in range(m):
parent_dict[key] = 0
target_keys = random.sample(range(m), random.randint(n0, n1))
return parent_dict, target_keys
test_functions = {
'remove': (remove, lambda x: x),
'add_dict_comprehension': (add_dict_comprehension, lambda x: x),
'add_key_comprehension': (add_key_comprehension, lambda x: x),
'add_dict_comprehension_set': (add_dict_comprehension, lambda keys: set(keys)),
'remove_set': (remove, lambda keys: set(keys)),
}
results = {k:[] for k in test_functions.keys()}
for _ in range(trials):
parent_dict, target_keys = generate_test_data(m, n0, n1)
for k,(func, preprocessor) in test_functions.items():
# Not timing the preprocessing, assuming this is done on construction
processed_keys = preprocessor(target_keys)
starttime = timeit.default_timer()
func(parent_dict, processed_keys)
results[k].append(timeit.default_timer() - starttime)
def avg(lst):
return sum(lst)/len(lst)
def stddev(lst):
mean = avg(lst)
r2 = sum([(i-mean)**2 for i in lst])
bias = len(lst) - 1
return (r2/bias)**.5
result_data = {k: {'avg': avg(v), 'stddev': stddev(v)} for k,v in results.items()}
title_len = max(len(i) for i in result_data.keys())
for (name, data) in sorted(result_data.items(), key = lambda item: item[1]['avg']):
print(f"{name}: {''.join([' ']*(title_len-len(name)))} Avg. {data['avg']} [Stddev: {data['stddev']}]")
</code></pre>
<p>正如预期的那样,关键理解是最快的,使用集合而不是列表对于选项1和;2) 集合的平均查找时间为<code>O(1)</code>,因此可以避免数组的大部分搜索开销</p>
<pre><code>add_key_comprehension: Avg. 8.616499952040612e-05 [Stddev: 4.227672839185718e-05]
add_dict_comprehension_set: Avg. 0.0001553589993272908 [Stddev: 3.535997011243502e-05]
remove_set: Avg. 0.00021654199925251306 [Stddev: 0.00017905888806226246]
add_dict_comprehension: Avg. 0.013669110001646913 [Stddev: 0.0031769500128693085]
remove: Avg. 0.0138856839996879 [Stddev: 0.004191406076651088]
</code></pre>