Python 链表 O(1) 插入/删除

8 投票
11 回答
19103 浏览
提问于 2025-04-15 18:35

我在找一个适合Python的链表和相关算法的实现。每次问别人,他们都推荐用Python自带的列表,但性能测试显示,列表的插入和删除操作对我们的应用来说是个瓶颈。实现一个简单的链表很简单,但我想知道有没有成熟的库,能提供一些像排序、合并、拼接、查找、上下界等操作……

我知道这个问题可能重复,但在任何搜索引擎上搜索Python列表,结果都很糟糕,大多数人只说在Python中链表是没必要的(呸!)。

补充一下:我需要在列表的任何位置插入和删除,而不仅仅是两端。

好吧,你问我了:我需要维护一个有几十万条记录的有序列表。我会从头开始,或者从通过二分查找找到的位置,逐个遍历列表,对每个条目进行处理。当找到一个符合条件的条目时,就把它从列表中删除,然后在被删除条目的前一个位置开始,对列表的一个子集进行另一次二分查找,直到找到一个事先统计好的位置。忽略错误情况,修改后的条目可以用来创建另一个链表,并将其拼接到通过第二次二分查找找到的新位置。然后从被删除条目的位置继续遍历。有时候,可能会在列表的任何地方添加或删除几千个连续的有序条目。有时还需要逐步查找和删除几千个不连续的条目。

Python的列表是不可接受的,因为插入和删除的成本太高,而二分查找的速度提升对整体成本来说完全无关紧要。我们内部的测试证实了这一点。

如果我遗漏了什么细节,也许我可以给你发一份我公司的保密协议,然后我们可以私下讨论这个问题。讽刺结束。

11 个回答

7

大家都在问为什么需要链表,这让我觉得很奇怪。链表是最基本的数据结构之一,这不是没有原因的:它有一些其他主要数据结构没有的特点。如果你需要这些特点,那就得用链表或者它的近亲。如果你不明白为什么链表是一个重要的数据结构,不能总是用双端队列或二叉树来替代,那你可能在“数据结构入门”课上没学明白。

这里有一个简单的实现,支持一些常见的功能:在任何位置插入节点(只要你有那个节点的引用),把链表分成两个链表,以及在一个链表中间插入另一个链表(拼接)。它还支持常见的Python接口:push(推入)、pop(弹出)、pushleft(从左边推入)、popleft(从左边弹出)、extend(扩展)、普通迭代和切片迭代(getiter)。

我刚写的这个代码,经过了文档测试,但还没有在实际生产环境中测试过;可能还有一些bug。

def _ref(obj):
    """
    weakref.ref has a bit of braindamage: you can't take a weakref to None.
    This is a major hassle and a needless limitation; work around it.
    """
    from weakref import ref
    if obj is None:
        class NullRef(object):
            def __call__(self): return None
        return NullRef()
    else:
        return ref(obj)

class _node(object):
    def __init__(self, obj):
        self.obj = obj
        self._next = None
        self._prev = _ref(None)
    def __repr__(self):
        return "node(%s)" % repr(self.obj)

    def __call__(self):
        return self.obj

    @property
    def next(self):
        return self._next

    @property
    def prev(self):
        return self._prev()

# Implementation note: all "_last" and "prev" links are weakrefs, to prevent circular references.
# This is important; if we don't do this, every list will be a big circular reference.  This would
# affect collection of the actual objects in the list, not just our node objects.
#
# This means that _node objects can't exist on their own; they must be part of a list, or nodes
# in the list will be collected.  We also have to pay attention to references when we move nodes
# from one list to another.
class llist(object):
    """
    Implements a doubly-linked list.
    """
    def __init__(self, init=None):
        self._first = None
        self._last = _ref(None)
        if init is not None:
            self.extend(init)

    def insert(self, item, node=None):
        """
        Insert item before node.  If node is None, insert at the end of the list.
        Return the node created for item.

        >>> l = llist()
        >>> a = l.insert(1)
        >>> b = l.insert(2)
        >>> d = l.insert(4)
        >>> l._check()
        [1, 2, 4]
        >>> c = l.insert(3, d)
        >>> l._check()
        [1, 2, 3, 4]
        """
        item = _node(item)

        if node is None:
            if self._last() is not None:
                self._last()._next = item
                item._prev = _ref(self._last())
            self._last = _ref(item)
            if self._first is None:
                self._first = item
        else:
            assert self._first is not None, "insertion node must be None when the list is empty"
            if node._prev() is not None:
                node._prev()._next = item
            item._prev = node._prev
            item._next = node
            node._prev = _ref(item)
            if node is self._first:
                self._first = item
        return item

    def remove(self, node):
        """
        >>> l = llist()
        >>> a = l.append(1)
        >>> b = l.append(2)
        >>> c = l.append(3)
        >>> d = l.append(4)
        >>> e = l.append(5)
        >>> l.remove(c) # Check removing from the middle
        3
        >>> l._check()
        [1, 2, 4, 5]
        >>> l.remove(a) # Check removing from the start
        1
        >>> l._check()
        [2, 4, 5]
        >>> l.remove(e) # Check removing from the end
        5
        >>> l._check()
        [2, 4]
        """
        if self._first is node:
            self._first = node._next
        if self._last() is node:
            self._last = node._prev
        if node._next is not None:
            node._next._prev = node._prev
        if node._prev() is not None:
            node._prev()._next = node._next
        node._next = None
        node._prev = _ref(None)
        return node.obj

    def __nonzero__(self):
        """
        A list is true if it has any elements.

        >>> l = llist()
        >>> bool(l)
        False
        >>> l = llist([1])
        >>> bool(l)
        True
        """
        return self._first is not None


    def __iter__(self):
        """
        >>> l = llist([1,2,3])
        >>> [i() for i in l]
        [1, 2, 3]
        """
        return self.getiter(self._first, self._last())

    def _check(self):
        if self._last() is None:
            assert self._last() is None
            return []
        node = self._first
        ret = []
        while node is not None:
            if node._next is None:
                assert node == self._last()
            if node._prev() is None:
                assert node == self._first
            if node._next is not None:
                assert node._next._prev() == node
            if node._prev() is not None:
                assert node._prev()._next == node
            ret.append(node.obj)
            node = node._next
        return ret

    def getiter(self, first, last):
        """
        Return an iterator over [first,last].

        >>> l = llist()
        >>> l.append(1)
        node(1)
        >>> start = l.append(2)
        >>> l.extend([3,4,5,6])
        >>> end = l.append(7)
        >>> l.extend([8,9])
        >>> [i() for i in l.getiter(start, end)]
        [2, 3, 4, 5, 6, 7]
        """
        class listiter(object):
            def __init__(self, first, last):
                self.node = first
                self.final_node = last

            def __iter__(self): return self
            def next(self):
                ret = self.node
                if ret is None:
                    raise StopIteration
                if ret is self.final_node:
                    self.node = None
                else:
                    self.node = self.node._next
                return ret
        return listiter(first, last)

    def append(self, item):
        """
        Add an item to the end of the list.

        >>> l = llist()
        >>> l.append(1)
        node(1)
        >>> l.append(2)
        node(2)
        >>> l._check()
        [1, 2]
        """
        return self.insert(item, None)

    def appendleft(self, item):
        """
        Add an item to the beginning of the list.

        >>> l = llist()
        >>> l.appendleft(1)
        node(1)
        >>> l.appendleft(2)
        node(2)
        >>> l._check()
        [2, 1]
        """
        return self.insert(item, self._first)

    def pop(self):
        """
        Remove an item from the end of the list and return it.

        >>> l = llist([1,2,3])
        >>> l.pop()
        3
        >>> l.pop()
        2
        >>> l.pop()
        1
        >>> l.pop()
        Traceback (most recent call last):
        ...
        IndexError: pop from empty llist
        """
        if self._last() is None:
            raise IndexError, "pop from empty llist"
        return self.remove(self._last())

    def popleft(self):
        """
        Remove an item from the beginning of the list and return it.

        >>> l = llist([1,2,3])
        >>> l.popleft()
        1
        >>> l.popleft()
        2
        >>> l.popleft()
        3
        >>> l.popleft()
        Traceback (most recent call last):
        ...
        IndexError: popleft from empty llist
        """
        if self._first is None:
            raise IndexError, "popleft from empty llist"
        return self.remove(self._first)

    def splice(self, source, node=None):
        """
        Splice the contents of source into this list before node; if node is None, insert at
        the end.  Empty source_list.  Return the first and last nodes that were moved.

        # Test inserting at the beginning.
        >>> l = llist()
        >>> a = l.append(1)
        >>> b = l.append(2)
        >>> c = l.append(3)
        >>> l2 = llist([4,5,6])
        >>> l.splice(l2, a)
        (node(4), node(6))
        >>> l._check()
        [4, 5, 6, 1, 2, 3]
        >>> l2._check()
        []

        # Test inserting in the middle.
        >>> l = llist()
        >>> a = l.append(1)
        >>> b = l.append(2)
        >>> c = l.append(3)
        >>> l2 = llist([4,5,6])
        >>> l.splice(l2, b)
        (node(4), node(6))
        >>> l._check()
        [1, 4, 5, 6, 2, 3]
        >>> l2._check()
        []

        # Test inserting at the end.
        >>> l = llist()
        >>> a = l.append(1)
        >>> b = l.append(2)
        >>> c = l.append(3)
        >>> l2 = llist([4,5,6])
        >>> l.splice(l2, None)
        (node(4), node(6))
        >>> l._check()
        [1, 2, 3, 4, 5, 6]
        >>> l2._check()
        []

        # Test inserting a list with a single item.
        >>> l = llist()
        >>> a = l.append(1)
        >>> b = l.append(2)
        >>> c = l.append(3)
        >>> l2 = llist([4])
        >>> l.splice(l2, b)
        (node(4), node(4))
        >>> l._check()
        [1, 4, 2, 3]
        >>> l2._check()
        []
        """
        if source._first is None:
            return
        first = source._first
        last = source._last()

        if node is None:
            if self._last() is not None:
                self._last()._next = source._first
            source._first._prev = self._last
            self._last = source._last
            if self._first is None:
                self._first = source._first
        else:
            source._first._prev = node._prev
            source._last()._next = node
            if node._prev() is not None:
                node._prev()._next = source._first
            node._prev = source._last
            if node is self._first:
                self._first = source._first
        source._first = None
        source._last = _ref(None)
        return first, last

    def split(self, start, end=None):
        """
        Remove all items between [node, end] and return them in a new list.  If end is None,
        remove until the end of the list.

        >>> l = llist()
        >>> a = l.append(1)
        >>> b = l.append(2)
        >>> c = l.append(3)
        >>> d = l.append(4)
        >>> e = l.append(5)
        >>> l._check()
        [1, 2, 3, 4, 5]
        >>> l2 = l.split(c, e)
        >>> l._check()
        [1, 2]
        >>> l2._check()
        [3, 4, 5]

        >>> l = llist()
        >>> a = l.append(1)
        >>> b = l.append(2)
        >>> c = l.append(3)
        >>> d = l.append(4)
        >>> e = l.append(5)
        >>> l2 = l.split(a, c)
        >>> l._check()
        [4, 5]
        >>> l2._check()
        [1, 2, 3]

        >>> l = llist()
        >>> a = l.append(1)
        >>> b = l.append(2)
        >>> c = l.append(3)
        >>> d = l.append(4)
        >>> e = l.append(5)
        >>> l2 = l.split(b, d)
        >>> l._check()
        [1, 5]
        >>> l2._check()
        [2, 3, 4]
        """
        if end is None:
            end = self._last()

        ret = llist()

        # First, move the region into the new list.  It's important to do this first, or
        # once we remove the nodes from the old list, they'll be held only by weakrefs and
        # nodes could end up being collected before we put it into the new one.
        ret._first = start
        ret._last = _ref(end)

        # Hook our own nodes back together.
        if start is self._first:
            self._first = end._next
        if end is self._last():
            self._last = start._prev

        if start._prev() is not None:
            start._prev()._next = end._next
        if end._next is not None:
            end._next._prev = start._prev
        start._prev = _ref(None)
        end._next = None

        return ret

    def extend(self, items):
        """
        >>> l = llist()
        >>> l.extend([1,2,3,4,5])
        >>> l._check()
        [1, 2, 3, 4, 5]
        """
        for item in items:
            self.append(item)

if __name__ == "__main__":
    import doctest
    doctest.testmod()
8

在Python中,列表在末尾进行操作的效率是O(1),也就是说,添加或删除元素非常快。如果你打算以一种半顺序的方式进行插入——就像在C语言中,只在列表中间保持一个指针作为“光标”——那么你可以通过使用两个Python列表来省去很多麻烦。一个列表用来存放光标前面的元素,另一个用来存放光标后面的元素;移动光标时,只需从一个列表中取出下一个元素并添加到另一个列表中。这样,你就可以在光标位置以O(1)的效率插入元素,而且比创建一个全新的数据结构要简单得多,还能利用许多现有的列表功能。

不过,如果你需要在列表中有多个引用,那么你可能就得考虑使用某种链表了。

补充:你不会真的想在链表上做“二分查找”吧?因为二分查找在本质上是顺序的数据结构上是没有意义的……

总之,如果你能接受线性时间的搜索,并且你的插入操作总是能保持列表的顺序而不需要重新排序,那么一个简单的链表可能就足够了。如果你搜索和遍历的次数差不多,建议考虑一些索引速度快的数据结构,如果需要重新排序,像树这样的结构会更好。

10

这里有一篇博客文章,分享了你的困扰。里面有一个链表的实现和性能比较。


不过,也许使用blist会更好一些(可以从这里获取)?

在以下情况下,BList的速度稍微慢于Python的列表(复杂度分别是O(log n)和O(1)):

  1. 一个很大的列表,长度从来不变。
  2. 一个很大的列表,插入和删除操作只发生在列表的末尾(后进先出)。

说完这些限制条件,接下来是一些BList比内置列表快得多的使用场景:

  1. 在一个大列表中插入或删除元素(复杂度分别是O(log n)和O(n))
  2. 从大列表中提取大块数据(复杂度分别是O(log n)和O(n))
  3. 制作大列表的浅拷贝(复杂度分别是O(1)和O(n))
  4. 修改大列表中的大块数据(复杂度分别是O(log n + log k)和O(n + k))
  5. 将一个列表复制多次以创建一个大的稀疏列表(复杂度分别是O(log k)和O(kn))

需要注意的是,它实际上是作为B+树实现的,这使得这些操作的性能都非常优秀。

撰写回答