Python链表O(1)插入/删除

2024-03-29 09:30:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在寻找Python的链表和相关算法实现。我问的每个人都建议使用内置的Python列表,但性能测量表明,列表的插入和删除是应用程序的瓶颈。实现一个简单的链表是很简单的,但我想知道是否有一个成熟的库,它包括一些操作,如排序、合并、拼接、搜索、下限/上限等

我知道这是一个骗局,但在任何搜索引擎上搜索python列表都会得到预期的糟糕结果,大多数人只是说python(pfft!)中不需要链表

PS:我需要在列表中的任何位置插入和删除,而不仅仅是结尾

好的,你自找的: 我需要维护一个有几十万条记录的有序列表。我将(一个接一个地)向前遍历列表,在每个条目上使用访问者,从开始或通过二进制搜索找到的位置开始。当找到与谓词匹配的条目时,会将其从列表中删除,然后从删除条目的前一个位置开始,对列表的子集执行另一个二进制搜索,直到事先统计确定位置为止。忽略错误条件,修改后的条目可用于创建另一个链表,该链表拼接到通过第二次二进制搜索找到的新位置。从删除条目的位置继续迭代。有时,可以在列表中的任何位置添加或删除数千个连续的有序条目。有时,必须以增量方式搜索和删除数千个不连续的条目

python的列表是不可接受的,因为插入/删除的成本太高,二进制搜索速度的微小提高与总成本完全无关。我们的内部测试证实了这一点

如果我忽略了任何细节,也许我可以通过电子邮件向您发送一份我公司的保密协议副本,并就此事与您私下通信讽刺。结束()


Tags: 目的算法应用程序列表排序二进制条目性能
3条回答

Python列表是O(1) for operations at the end of the list。如果您将以半连续的方式进行所有插入操作(与C类似,只将一个指针作为某种“游标”保留在列表的中间),那么只需使用两个Python列表就可以节省大量的工作量。一个列表显示光标前的内容,一个列表显示光标后的内容;移动光标涉及从一个列表中拖出下一个项目并将其附加到另一个列表中。这使您可以在光标位置进行任意O(1)插入,与创建一个完整的新数据结构相比,所需的工作量和重复性要少得多,从而可以重用许多现有的列表函数

但是,对于允许列表中有多个引用的完全通用的情况,您可能无法创建某种类型的链接列表

编辑:您没有认真考虑对链接列表进行“二进制搜索”,是吗?二进制搜索在本质上是连续的数据结构上甚至没有意义

无论如何,如果你对线性时间搜索没问题,并且你的插入总是保持列表顺序而不重新排序,那么一个简单的链表可能就是你所需要的。如果你像迭代一样做大量的搜索,你应该考虑快速索引,如果需要诉诸的话,像树之类的东西会更好。p>

令人费解的是,每个人都要求有理由需要一个链接列表。链表是最基本的数据结构之一,其原因是:它们具有其他主要数据结构所缺乏的属性,如果需要这些属性,则需要链表或其近亲之一。如果您不理解为什么链表是一种重要的数据结构,不能总是用deque或二叉树来替换,那么您永远不应该传递“数据结构简介”类

这里有一个快速的实现,支持通常的功能:在给定节点引用的任意点进行常量时间插入,将列表拆分为两个列表,并将一个列表插入另一个列表的中间(拼接)。支持通用Python接口:push、pop、pushleft、popleft、extend、普通迭代、片上迭代(getiter)

我刚刚写了这篇文章,所以它是博士论文,但没有经过生产测试;可能仍然有bug

def _ref(obj):
    """
    weakref.ref has a bit of braindamage: you can't take a weakref to None.
    This is a major hassle and a needless limitation; work around it.
    """
    from weakref import ref
    if obj is None:
        class NullRef(object):
            def __call__(self): return None
        return NullRef()
    else:
        return ref(obj)

class _node(object):
    def __init__(self, obj):
        self.obj = obj
        self._next = None
        self._prev = _ref(None)
    def __repr__(self):
        return "node(%s)" % repr(self.obj)

    def __call__(self):
        return self.obj

    @property
    def next(self):
        return self._next

    @property
    def prev(self):
        return self._prev()

# Implementation note: all "_last" and "prev" links are weakrefs, to prevent circular references.
# This is important; if we don't do this, every list will be a big circular reference.  This would
# affect collection of the actual objects in the list, not just our node objects.
#
# This means that _node objects can't exist on their own; they must be part of a list, or nodes
# in the list will be collected.  We also have to pay attention to references when we move nodes
# from one list to another.
class llist(object):
    """
    Implements a doubly-linked list.
    """
    def __init__(self, init=None):
        self._first = None
        self._last = _ref(None)
        if init is not None:
            self.extend(init)

    def insert(self, item, node=None):
        """
        Insert item before node.  If node is None, insert at the end of the list.
        Return the node created for item.

        >>> l = llist()
        >>> a = l.insert(1)
        >>> b = l.insert(2)
        >>> d = l.insert(4)
        >>> l._check()
        [1, 2, 4]
        >>> c = l.insert(3, d)
        >>> l._check()
        [1, 2, 3, 4]
        """
        item = _node(item)

        if node is None:
            if self._last() is not None:
                self._last()._next = item
                item._prev = _ref(self._last())
            self._last = _ref(item)
            if self._first is None:
                self._first = item
        else:
            assert self._first is not None, "insertion node must be None when the list is empty"
            if node._prev() is not None:
                node._prev()._next = item
            item._prev = node._prev
            item._next = node
            node._prev = _ref(item)
            if node is self._first:
                self._first = item
        return item

    def remove(self, node):
        """
        >>> l = llist()
        >>> a = l.append(1)
        >>> b = l.append(2)
        >>> c = l.append(3)
        >>> d = l.append(4)
        >>> e = l.append(5)
        >>> l.remove(c) # Check removing from the middle
        3
        >>> l._check()
        [1, 2, 4, 5]
        >>> l.remove(a) # Check removing from the start
        1
        >>> l._check()
        [2, 4, 5]
        >>> l.remove(e) # Check removing from the end
        5
        >>> l._check()
        [2, 4]
        """
        if self._first is node:
            self._first = node._next
        if self._last() is node:
            self._last = node._prev
        if node._next is not None:
            node._next._prev = node._prev
        if node._prev() is not None:
            node._prev()._next = node._next
        node._next = None
        node._prev = _ref(None)
        return node.obj

    def __nonzero__(self):
        """
        A list is true if it has any elements.

        >>> l = llist()
        >>> bool(l)
        False
        >>> l = llist([1])
        >>> bool(l)
        True
        """
        return self._first is not None


    def __iter__(self):
        """
        >>> l = llist([1,2,3])
        >>> [i() for i in l]
        [1, 2, 3]
        """
        return self.getiter(self._first, self._last())

    def _check(self):
        if self._last() is None:
            assert self._last() is None
            return []
        node = self._first
        ret = []
        while node is not None:
            if node._next is None:
                assert node == self._last()
            if node._prev() is None:
                assert node == self._first
            if node._next is not None:
                assert node._next._prev() == node
            if node._prev() is not None:
                assert node._prev()._next == node
            ret.append(node.obj)
            node = node._next
        return ret

    def getiter(self, first, last):
        """
        Return an iterator over [first,last].

        >>> l = llist()
        >>> l.append(1)
        node(1)
        >>> start = l.append(2)
        >>> l.extend([3,4,5,6])
        >>> end = l.append(7)
        >>> l.extend([8,9])
        >>> [i() for i in l.getiter(start, end)]
        [2, 3, 4, 5, 6, 7]
        """
        class listiter(object):
            def __init__(self, first, last):
                self.node = first
                self.final_node = last

            def __iter__(self): return self
            def next(self):
                ret = self.node
                if ret is None:
                    raise StopIteration
                if ret is self.final_node:
                    self.node = None
                else:
                    self.node = self.node._next
                return ret
        return listiter(first, last)

    def append(self, item):
        """
        Add an item to the end of the list.

        >>> l = llist()
        >>> l.append(1)
        node(1)
        >>> l.append(2)
        node(2)
        >>> l._check()
        [1, 2]
        """
        return self.insert(item, None)

    def appendleft(self, item):
        """
        Add an item to the beginning of the list.

        >>> l = llist()
        >>> l.appendleft(1)
        node(1)
        >>> l.appendleft(2)
        node(2)
        >>> l._check()
        [2, 1]
        """
        return self.insert(item, self._first)

    def pop(self):
        """
        Remove an item from the end of the list and return it.

        >>> l = llist([1,2,3])
        >>> l.pop()
        3
        >>> l.pop()
        2
        >>> l.pop()
        1
        >>> l.pop()
        Traceback (most recent call last):
        ...
        IndexError: pop from empty llist
        """
        if self._last() is None:
            raise IndexError, "pop from empty llist"
        return self.remove(self._last())

    def popleft(self):
        """
        Remove an item from the beginning of the list and return it.

        >>> l = llist([1,2,3])
        >>> l.popleft()
        1
        >>> l.popleft()
        2
        >>> l.popleft()
        3
        >>> l.popleft()
        Traceback (most recent call last):
        ...
        IndexError: popleft from empty llist
        """
        if self._first is None:
            raise IndexError, "popleft from empty llist"
        return self.remove(self._first)

    def splice(self, source, node=None):
        """
        Splice the contents of source into this list before node; if node is None, insert at
        the end.  Empty source_list.  Return the first and last nodes that were moved.

        # Test inserting at the beginning.
        >>> l = llist()
        >>> a = l.append(1)
        >>> b = l.append(2)
        >>> c = l.append(3)
        >>> l2 = llist([4,5,6])
        >>> l.splice(l2, a)
        (node(4), node(6))
        >>> l._check()
        [4, 5, 6, 1, 2, 3]
        >>> l2._check()
        []

        # Test inserting in the middle.
        >>> l = llist()
        >>> a = l.append(1)
        >>> b = l.append(2)
        >>> c = l.append(3)
        >>> l2 = llist([4,5,6])
        >>> l.splice(l2, b)
        (node(4), node(6))
        >>> l._check()
        [1, 4, 5, 6, 2, 3]
        >>> l2._check()
        []

        # Test inserting at the end.
        >>> l = llist()
        >>> a = l.append(1)
        >>> b = l.append(2)
        >>> c = l.append(3)
        >>> l2 = llist([4,5,6])
        >>> l.splice(l2, None)
        (node(4), node(6))
        >>> l._check()
        [1, 2, 3, 4, 5, 6]
        >>> l2._check()
        []

        # Test inserting a list with a single item.
        >>> l = llist()
        >>> a = l.append(1)
        >>> b = l.append(2)
        >>> c = l.append(3)
        >>> l2 = llist([4])
        >>> l.splice(l2, b)
        (node(4), node(4))
        >>> l._check()
        [1, 4, 2, 3]
        >>> l2._check()
        []
        """
        if source._first is None:
            return
        first = source._first
        last = source._last()

        if node is None:
            if self._last() is not None:
                self._last()._next = source._first
            source._first._prev = self._last
            self._last = source._last
            if self._first is None:
                self._first = source._first
        else:
            source._first._prev = node._prev
            source._last()._next = node
            if node._prev() is not None:
                node._prev()._next = source._first
            node._prev = source._last
            if node is self._first:
                self._first = source._first
        source._first = None
        source._last = _ref(None)
        return first, last

    def split(self, start, end=None):
        """
        Remove all items between [node, end] and return them in a new list.  If end is None,
        remove until the end of the list.

        >>> l = llist()
        >>> a = l.append(1)
        >>> b = l.append(2)
        >>> c = l.append(3)
        >>> d = l.append(4)
        >>> e = l.append(5)
        >>> l._check()
        [1, 2, 3, 4, 5]
        >>> l2 = l.split(c, e)
        >>> l._check()
        [1, 2]
        >>> l2._check()
        [3, 4, 5]

        >>> l = llist()
        >>> a = l.append(1)
        >>> b = l.append(2)
        >>> c = l.append(3)
        >>> d = l.append(4)
        >>> e = l.append(5)
        >>> l2 = l.split(a, c)
        >>> l._check()
        [4, 5]
        >>> l2._check()
        [1, 2, 3]

        >>> l = llist()
        >>> a = l.append(1)
        >>> b = l.append(2)
        >>> c = l.append(3)
        >>> d = l.append(4)
        >>> e = l.append(5)
        >>> l2 = l.split(b, d)
        >>> l._check()
        [1, 5]
        >>> l2._check()
        [2, 3, 4]
        """
        if end is None:
            end = self._last()

        ret = llist()

        # First, move the region into the new list.  It's important to do this first, or
        # once we remove the nodes from the old list, they'll be held only by weakrefs and
        # nodes could end up being collected before we put it into the new one.
        ret._first = start
        ret._last = _ref(end)

        # Hook our own nodes back together.
        if start is self._first:
            self._first = end._next
        if end is self._last():
            self._last = start._prev

        if start._prev() is not None:
            start._prev()._next = end._next
        if end._next is not None:
            end._next._prev = start._prev
        start._prev = _ref(None)
        end._next = None

        return ret

    def extend(self, items):
        """
        >>> l = llist()
        >>> l.extend([1,2,3,4,5])
        >>> l._check()
        [1, 2, 3, 4, 5]
        """
        for item in items:
            self.append(item)

if __name__ == "__main__":
    import doctest
    doctest.testmod()

这里有一个blog post分享你的痛苦。它包括链表的实现和性能比较


也许blist会更好,但是(从here

The use cases where the BList is slightly slower than Python's list are as follows (O(log n) vs. O(1)):

  1. A large list that never changes length.
  2. A large lists where inserts and deletes are only at the end of the list (LIFO).

With that disclaimer out of the way, here are some of the use cases where the BLists is dramatically faster than the built-in list:

  1. Insertion into or removal from a large list (O(log n) vs. O(n))
  2. Taking large slices of large lists (O(log n) vs O(n))
  3. Making shallow copies of large lists (O(1) vs. O(n))
  4. Changing large slices of large lists (O(log n + log k) vs. O(n + k))
  5. Multiplying a list to make a large, sparse list (O(log k) vs. O(kn))

请注意,它实际上是作为B+树实现的,允许所有这些操作都有很好的性能

相关问题 更多 >