bloom过滤器的实现。
bloomp的Python项目详细描述
bloompy
4种bloom过滤器在python3中的实现Chinese Edition
bloompy includes the standard BloomFilter,CountingBloomFilter,ScalableBloomFilter,ScalableCountingBloomFilter. It's Update from pybloom.
安装
pip install bloompy
用法
Bloompy可以使用4种Bloomfilter。
- 标准布鲁姆过滤器
标准的只能在其中查询或添加元素。
>>>importbloompy>>>bf=bloompy.BloomFilter(error_rate=0.001,element_num=10**3)# query the status of the element inside the bf immediately # and add it into the bf.False returned indicates the element# does not inside the filter.>>>bf.add(1)False>>>bf.add(1)True>>>1inbfTrue>>>bf.exists(1)True>>>bf.add([1,2,3])False>>>bf.add([1,2,3])True>>>[1,2,3]inbfTrue>>>bf.exists([1,2,3])True# store the bf into a file.>>>bf.tofile('filename.suffix')# recover a bf from a file.Auto recognize which kind of filters it is.>>>recovered_bf=bloompy.get_filter_fromfile('filename.suffix')# or you can use a classmethod 'fromfile' of the BloomFilter Class to get# a BloomFilter instance from a file.Same as other kind of filter Classes .>>>recovered_bf=bloompy.BloomFilter.fromfile('filename.suffix')# return the total number of the elements inside the bf.>>>bf.count2# the capacity of the current filter.>>>bf.capacity1000# the bits array of the current filter. >>>bf.bit_arraybitarray('00....')# the total length of the bitarray.>>>bf.bit_num14400# the hash seeds inside the filter.# they are prime numbers by default.It's modifiable.>>>bf.seeds[2,3,5,7,11,...]# the amount of hash functions >>>bf.hash_num10
- 计数布鲁姆过滤器
计数bloom过滤器是标准bloom过滤器的一个子类,但它支持delete操作。 其中4位表示标准bf的bit。所以它比标准高炉更贵, 是标准的4倍。
>>>importbloompy>>>cbf=bloompy.CountingBloomFilter(error_rate=0.001,element_num=10**3)# same as the standard bf at add operation.>>>cbf.add(12)False>>>cbf.add(12)True>>>12incbfTrue>>>cbf.count1# query the status of the element inside the cbf immediately # if the element inside the cbf,delete it.>>>cbf.delete(12)True>>>cbf.delete(12)False>>>12incbfFalse>>>cbf.count0# recover a cbf from a file.Same as the bf.>>>recovered_cbf=bloompy.CountingBloomFilter.fromfile('filename.suffix')
你也可以对它进行任何操作。
- scalable bloom过滤器
如果当前插入的元素数量达到限制,则自动增加筛选器的容量。 默认设置为内部预容量的2倍。
>>>importbloompy>>>sbf=bloompy.ScalableBloomFilter(error_rate=0.001,initial_capacity=10**3)# at first, the sbf is at 1000 capacity limits.>>>len(sbf)0>>>12insbfFalse>>>sbf.add(12)False>>>12insbfTrue>>>len(sbf)1>>>sbf.filters[<bloompy.BloomFilterobjectat0x000000000B6F5860>]>>>sbf.capacity1000# when the amount of inserting elements surpass the limits 1000.# the sbf appends a new filter inside it which capacity 2times 1000.>>>foriinrange(1000):sbf.add(i)>>>600insbfTrue>>>len(sbf)2>>>sbf.filters[<bloompy.BloomFilterobjectat0x000000000B6F5860>,<bloompy.BloomFilterobjectat0x000000000B32F748>]>>>sbf.capacity3000# recover a sbf from a file.Same as bf.>>>recovered_sbf=bloompy.ScalableBloomFilter.fromfile('filename.suffix')
你也可以对它进行任何操作。
- scalable counting bloom过滤器
它是scalablebloomfilter的一个子类,但它支持delete操作。 您也可以对它执行scalablebloomfilter的任何操作。
>>>importbloompy>>>scbf=bloompy.SCBloomFilter(error_rate=0.001,initial_capacity=10**3)>>>scbf.add(1)False>>>1inscbfTrue>>>scbf.delete(1)True>>>1inscbfFalse>>>len(scbf)1>>>scbf.filters[<bloompy.CountingBloomFilterobjectat0x000000000B6F5828>]# add elements in sbf to make it at a capacity limits>>>foriinrange(1100):scbf.add(i)>>>len(scbf)2>>>scbf.filters[<bloompy.CountingBloomFilterobjectat0x000000000B6F5828>,<bloompy.CountingBloomFilterobjectat0x000000000B6F5898>]# recover a scbf from a file.Same as bf.>>>recovered_scbf=bloompy.SCBloomFilter.fromfile('filename.suffix')
存储和恢复
如标准布鲁姆过滤器所示。您可以通过两种方式存储过滤器:
- 类方法“fromfile”
- 从file()获取过滤器
if you do clearly know that there is a BloomFilter stored in a file. you can recover it with:
bloompy.BloomeFilter.fromfile('filename.suffix')
or it's a CountingBloomFilter inside it:
bloompy.CountingBloomFilter.fromfile('filename.suffix')
Same as others.
But if you don't know what kind of filter it is stored in the file.Use:
bloompy.get_filter_fromfile('filename.suffix')
It will auto recognize the filter stored inside a file.Then you can do something with it.