高度优化的搜索树(红黑、展开和排序列表),带有可选的扩展(动态顺序统计、间隔树等)

Banyan的Python项目详细描述


这个包提供了python的setdict(带有可选的扩展)的排序插入版本。由于是基于树的,它们在简单的查找和修改方面不如基于散列的容器(如python的内置)高效。相反:

  • (常见情况:)对于修改和查找与排序迭代交错的情况,它们的效率远远高于它们。
  • (不太常见的情况是:)通过可选的tree augmentation,它们在利用底层树结构(例如,该项在集合中的顺序位置是什么)进行其他类型的有用查询时的效率远远高于它们。哪个几何区间与此区间重叠?).

功能

  • 支持针对不同用例的高性能算法:

  • 提供pythonic接口:

    • Collections are almost entirely drop-in sorted replacements for Python’s set and dict
    • User defined ^{tt1}$ functions (or older style ^{tt2}$ functions) are supported
    • Methods take slices and ranges where applicable
    • Pickle is supported
    • Reverse-order views are provided
  • 允许使用附加节点元数据和树方法的可选tree augmentation

    • Dynamic order statistics allow queries for the k-th item
    • Interval trees allow geometric querying
    • Any user-defined algorithm can be easily plugged in

    Note

    This feature can be almost entirely ignored if all that is needed are efficient sorted drop-in replacemnt containers for Python’s sets and dicts.

  • 优化实现(请参阅联机文档中的Performance部分):

    • C++ templated backend drives the implementation. C++ template metaprogramming is used to avoid run-time queries along the search path
    • Homogeneous-key trees optimization

几个快速示例

注意

以下示例假设首先键入:

>>> from banyan import *
  • 选择一个适合设置的算法,并获得python内置项的下拉排序替换:

    • A (red-black tree) general drop-in replacement for Python’s set:

      >>> t = SortedSet([2, 3, 1])
      >>> t
      SortedSet([1, 2, 3])
      >>> assert 2 in t
      >>> t.add(4)
      >>> len(t)
      4
      >>> t.add(4)
      >>> len(t)
      4
      >>> t.remove(4)
      >>> len(t)
      3
      >>> t.remove(4)
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "banyan/__init__.py", line 299, in remove
          self._tree.erase(item)
      KeyError: 4
      
    • 一个基于splay的排序的drop-in替换python的dict,针对临时局部访问进行了优化:

      >>> t = SortedDict([(2, 'b'), (3, 'c'), (1, 'a')], alg = SPLAY_TREE)
      >>> print(list(t))
      [1, 2, 3]
      >>> assert 1 in t
      >>> assert 4 not in t
      >>> # Now access the key 2 for awhile
      >>> t[2] = 'e'
      >>> t[2] = 'f'
      >>> t[2] = 'g'
      >>> t[2] = 'a'
      >>> t[2] = 'b'
      >>> t[2] = 'c'
      >>> t[2] = 'd'
      >>> t[2] = 'e'
      
    • python的frozenset的(基于排序列表的)排序内存效率下拉列表:

      >>> t = FrozenSortedSet(['hao', 'jiu', 'mei', 'jian'])
      >>> assert 'hao' in t
      >>> assert 'zaijian' not in t
      >>> t.add('zaijian')
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      AttributeError: 'FrozenSortedSet' object has no attribute 'add'
      
  • 指定比较条件-例如,使用带有小写比较的字符串字典:

    • Using the newer-style ^{tt1}$ parameter:
    >>> t = SortedDict(key = str.lower)
    >>> t['hao'] = 3
    >>> t['Hao'] = 4
    >>> t
    SortedDict({'Hao': 4})
    
    • Using the older-style (pre-Py3K) ^{tt2}$ parameter:
    >>> t = SortedDict(compare = lambda x, y: cmp(str.lower(x), str.lower(y)))
    >>> t['hao'] = 3
    >>> t['Hao'] = 4
    >>> t
    SortedDict({'Hao': 4})
    
  • 使用范围和切片:
    >>> import string
    >>>
    >>> t = SortedDict(zip(string.ascii_lowercase, string.ascii_uppercase))
    >>>
    >>> # Erase everything starting at 'e'
    >>> del t['e': ]
    >>> t
    SortedDict({'a': 'A', 'b': 'B', 'c': 'C', 'd': 'D'})
    >>>
    >>> # View the items between 'b' and 'd'
    >>> t.items('b', 'd')
    ItemsView([('b', 'B'), ('c', 'C')])
    >>>
    >>> # View the values of the keys between 'a' and 'c', in reverse order
    >>> t.values('a', 'c', reverse = True)
    ValuesView(['B', 'A'])
    >>>
    >>> # Set the three values of the keys between 'a' and 'd' to 2
    >>> t['a': 'd'] = [2, 2, 2]
    >>> t
    SortedDict({'a': 2, 'b': 2, 'c': 2, 'd': 'D'})
    
  • 易于使用的同构密钥优化:

    >>> # Simply specify the key type as key_type - no other changes are needed.
    >>> t = SortedSet([1, 2, 3], key_type = int)
    >>> assert 2 in t
    >>> t = SortedSet(['hao', 'jiu', 'mei', 'jian'], key_type = str)
    >>> assert 'hola' not in t
    
  • 使用pythonic版本的C++/STL’s lower_bound/upper_bound
    >>> from itertools import *
    >>>
    >>> t = SortedSet(['hao', 'jiu', 'mei', 'jian'])
    >>> t
    SortedSet(['hao', 'jian', 'jiu', 'mei'])
    >>> assert 'hao' in t
    >>>
    >>> # Find the key after 'hao'
    >>> keys = t.keys('hao', None)
    >>> next(islice(keys, 1, None))
    'jian'
    
  • 利用树结构可获得其他高效功能:

    • Use an overlapping-interval updator for creating a data structure that can efficiently answer overlapping queries:
    >>> t = SortedSet([(1, 3), (2, 4), (-2, 9)], updator = OverlappingIntervalsUpdator)
    >>>
    >>> print(t.overlap_point(-5))
    []
    >>> print(t.overlap_point(5))
    [(-2, 9)]
    >>> print(t.overlap_point(3.5))
    [(-2, 9), (2, 4)]
    >>>
    >>> print(t.overlap((-10, 10)))
    [(-2, 9), (1, 3), (2, 4)]
    

    For high performance, combine this with homogeneous-keys optimization:

    >>> t = SortedSet(key_type = (int, int), updator = OverlappingIntervalsUpdator)
    >>> t.add((1, 3))
    >>> t.overlap_point(2)
    [(1, 3)]
    >>>
    >>> t = SortedSet(key_type = (float, float), updator = OverlappingIntervalsUpdator)
    >>> t.add((1.0, 3.0))
    >>> t.overlap_point(2.5)
    [(1.0, 3.0)]
    
    • Use a rank updator for creating a data structure that can efficiently answer order queries:
    >>> t = SortedSet(['hao', 'jiu', 'mei', 'jian'], updator = RankUpdator)
    >>> t
    SortedSet(['hao', 'jian', 'jiu', 'mei'])
    >>>
    >>> # 'hao' is item no. 0
    >>> t.kth(0)
    'hao'
    >>> t.order('hao')
    0
    >>>
    >>> # 'mei' is item no. 3
    >>> t.kth(3)
    'mei'
    >>> t.order('mei')
    3
    
    • Use a min-gap updator for creating a data structur that can efficiently answer smallest-gap queries:
    >>> t = SortedSet([1, 3, 2], updator = MinGapUpdator)
    >>> t
    SortedSet([1, 2, 3])
    >>> t.min_gap()
    1
    >>> t.remove(2)
    >>> t
    SortedSet([1, 3])
    >>> t.min_gap()
    2
    

下载、安装、文档和错误跟踪

  • 包裹在PyPI

  • 通常使用python库的设置。类型:

    ^{tt5}$

    or

    ^{tt6}$

    Note

    To install this package from the source distribution, the system must have a C++ compiler installed. The setup script will invoke this compiler.

    Using Python 2.x on Windows will attempt to invoke Visual Studio 2008. If you are using a later version, download and extract the archive; then, from within the Banyan directory, use

    ^{tt7}$

    or

    ^{tt8}$

    (for Visual Studio 2010 and 2012, respectively), followed by

    ^{tt9}$

  • 文档位于PyPI Docs,也可以在发行版的“docs”目录中找到。

  • 错误跟踪在Google Code上。

更改

Changes
VersionDateDescription
0.1.505/04/2013Faster red-black tree iteration; minor documentation bugfixes
0.1.401/04/2013User key-function specification optimization; performance tests for dictionary types; better warnings for user mismatched policies
0.1.3.530/3/2013OverlappingIntervalsUpdator: more regression tests + performance improvements + performance comparison tests
0.1.328/03/2013OverlappingIntervalsUpdator implemented; minor documentation bugfixes
0.1.226/03/2013Efficient C++ RankUpdator and MinGapUpdator; MinMaxUpdator out; major internal simplification
0.1.017/03/2013Initial release

计划功能

  • 改进文档和性能文档w.r.t.密钥类型和策略规范。
  • 对用户不匹配的策略给出更好的警告
  • 实现优先级搜索树更新程序

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java我应该关闭tcp连接吗?   java指定初始化一个有引用和没有引用的类之间的区别   Java JSON反序列化错误   java将InputStream插入PostgreSQL   java Android屏幕在活动启动时取消伪装   java两个字符串实例看起来相同,但它们的哈希代码不同   java如何创建**数字**而不是字符串的数组列表?   java我可以确定由正则表达式模式匹配的第一个字符集吗?   java以编程方式更改日期范围的日期格式   java Hibernate在加载时填充自动连接字段   java如何使两个不相关的实体(两个存储库)同时在一个项目中运行?可能吗?   使用singlechildevent检索java Firebase数据   在安卓中尝试动态添加片段时未找到java ID   在HTML中编码Java GB2312字符串无法正确显示   java在缓慢的消费卡夫卡上处理背压并避免重新平衡   由hibernate生成的java查询过于冗长