增加列表的一部分最快的方法是什么？

3条回答

网友

1楼 · 编辑于 2024-05-15 13:36:26

如果要多次更新一个大列表的大范围，请使用更合适的数据结构，这样每次更新都不会花费O（n）。你知道吗

一种这样的数据结构是segment tree，其中每个列表元素对应于树中的叶节点；列表元素的真值可以表示为叶节点和根节点之间的路径上的值的总和。这样，将一个数字添加到单个内部节点实际上就像将它添加到该子树表示的所有列表元素一样。你知道吗

数据结构支持在O（logn）时间内按索引执行get/set操作，也支持在O（logn）时间内执行add-in-range操作。下面的解决方案使用二叉树，使用长度为<；=2n的列表实现

class RangeAddList:
    def __init__(self, vals):
        # list length
        self._n = len(vals)
        # smallest power of 2 >= list length
        self._m = 1 << (self._n - 1).bit_length()
        # list representing binary tree; leaf nodes offset by _m
        self._vals = [0]*self._m + vals

    def __repr__(self):
        return '{}({!r})'.format(self.__class__.__name__, list(self))

    def __len__(self):
        return self._n

    def __iter__(self):
        for i in range(self._n):
            yield self[i]

    def __getitem__(self, i):
        if i not in range(self._n):
            raise IndexError()

        # add up values from leaf to root node
        t = 0
        i += self._m
        while i > 0:
            t += self._vals[i]
            i >>= 1
        return t + self._vals[0]

    def __setitem__(self, i, x):
        # add difference (new value - old value)
        self._vals[self._m + i] += x - self[i]

    def add_in_range(self, i, j, x):
        if i not in range(self._n + 1) or j not in range(self._n + 1):
            raise IndexError()

        # add at internal nodes spanning range(i, j)
        i += self._m
        j += self._m
        while i < j:
            if i & 1:
                self._vals[i] += x
                i += 1
            if j & 1:
                j -= 1
                self._vals[j] += x
            i >>= 1
            j >>= 1

示例：

>>> r = RangeAddList([0] * 10)
>>> r.add_in_range(0, 4, 10)
>>> r.add_in_range(6, 9, 20)
>>> r.add_in_range(3, 7, 100)
>>> r
RangeAddList([10, 10, 10, 110, 100, 100, 120, 20, 20, 0])

事实证明，NumPy优化得非常好，在段树跟上之前，您需要向上搜索长度为50000左右的列表。对于我的机器上长度为100000的列表，段树的更新速度仍然只有NumPy的O（n）范围更新速度的两倍。您可能希望使用自己的数据进行基准测试。你知道吗

网友

2楼 · 编辑于 2024-05-15 13:36:26

使用Numpy对于快速数组操作，请检查以下示例：

import numpy as np

lst = np.array([1,2,3,4,5,6,7,8])

# add 2 at all indices from 4 till the end of the array
lst[4:] += 2

print(lst)
# array([ 1,  2,  3,  4,  7,  8,  9, 10])

网友

3楼 · 编辑于 2024-05-15 13:36:26

这是一种快速的方法：

lst1 = [1, 2, 3, 4, 5, 6, 7, 8]
new_list = [*lst[:4], *[x+2 for x in lst1[4:]]]
# or even better
new_list[4:] = [x+2 for x in lst1[4:]]

就速度而言，对于这么小的列表，numpy并不是更快：

import timeit
import numpy as np

lst1 = [1, 2, 3, 4, 5, 6, 7, 8]
npa = np.array(lst)


def numpy_it():
    global npa
    npa[4:] += 2


def python_it():
    global lst1
    lst1 = [*lst1[:4], *[x+2 for x in lst1[4:]]]


print(timeit.timeit(numpy_it))
print(timeit.timeit(python_it))

对我来说：

1.7008036
0.6737076000000002

但是对于任何严肃的numpy，它都比为需要替换的片段生成一个新的列表要好得多，这比重新生成整个列表要好得多（这比使用循环进行就地替换要好得多，如您的示例中所示）：

import timeit
import numpy as np

lst1 = list(range(0, 10000))
npa = np.array(lst1)
lst2 = list(range(0, 10000))
lst3 = list(range(0, 10000))


def numpy_it():
    global npa
    npa[4:] += 2


def python_it():
    global lst1
    lst1 = [*lst1[:4], *[x+2 for x in lst1[4:]]]


def python_it_slice():
    global lst2
    lst2[4:] = [x+2 for x in lst2[4:]]


def python_inplace():
    global lst3
    for i in range(4, len(lst3)):
        lst3[i] = lst3[i] + 2


n = 10000
print(timeit.timeit(numpy_it, number=n))
print(timeit.timeit(python_it_slice, number=n))
print(timeit.timeit(python_it, number=n))
print(timeit.timeit(python_inplace, number=n))

结果：

0.057994199999999996
4.3747423
4.5193105000000005
9.949074000000001

相关问题更多 >

编程相关推荐

热门问题

热门文章