为什么Python列表在内存中的大小与文档不符？

Question

我一直在努力理解Python是如何为列表分配内存的，特别是在通过添加元素来扩展列表时。这个问题很好地介绍了基础知识，并解释了随着列表长度增加，内存分配的增量是如何变大的。还有一篇文章，详细解释了源代码，源代码可以在这里找到。

我想询问一下这个解释：

/* This over-allocates proportional to the list size, making room
 * for additional growth.  The over-allocation is mild, but is
 * enough to give linear-time amortized behavior over a long
 * sequence of appends() in the presence of a poorly-performing
 * system realloc().
 * Add padding to make the allocated size multiple of 4.
 * The growth pattern is:  0, 4, 8, 16, 24, 32, 40, 52, 64, 76, ...
 * Note: new_allocated won't overflow because the largest possible value
 *       is PY_SSIZE_T_MAX * (9 / 8) + 6 which always fits in a size_t.
 */
new_allocated = ((size_t)newsize + (newsize >> 3) + 6) & ~(size_t)3;
/* Do not overallocate if the new size is closer to overallocated size
 * than to the old size.
 */

特别是这个计算：

new_allocated = ((size_t)newsize + (newsize >> 3) + 6) & ~(size_t)3;

我对这个计算的理解是，新分配的内存将等于当前大小（触发扩展的大小）加上当前大小右移三位（相当于除以八）再加上6。然后，这个结果与3的补码进行“与”运算，以确保最后两位为零，这样这个值就能被4整除。

我用这段代码生成了我的列表并报告了大小：

a = [i for i in range(108)]
print(sys.getsizeof(a))  # 920 bytes

b = [i for i in range(109)]
print(sys.getsizeof(b))  # 1080 bytes

当元素达到109个时，触发了重新调整大小，此时新大小为928字节。

上面的计算应该是这样的：

1048字节小于报告的1060字节。

文档中提到，这个过程在处理小列表时可能效果不好，所以我尝试了一个更大的列表。我不会在这里重复二进制的部分。

a = [i for i in range(10640)]
print(sys.getsizeof(a))  # 85176 bytes

b = [i for i in range(10641)]
print(sys.getsizeof(b))  # 95864 bytes

[85184 + (86184 >> 3) + 6] = 95838字节

当应用“& ~3”后，这个值会降到95836字节。同样，这个值还是低于报告的95864字节。

为什么报告的重新调整大小会大于计算出的重新调整大小呢？

性能优化数据结构内存管理计算机科学源代码分析列表扩展内存分配二进制运算

为什么Python列表在内存中的大小与文档不符？

1 个回答

撰写回答