numpy.searchsorted 在结构化数组上的性能较差
如果我用错了术语,请提前原谅,欢迎纠正。
我有一个已经排好序的数组,数据类型是 '<f16, |S30'
。当我在它的第一个字段上使用 searchsorted
时,速度非常慢(大约需要0.4秒来处理300万个项目)。这个速度比用普通的Python元组列表通过 bisect
方法要慢得多。
%timeit a['f0'].searchsorted(400.)
1 loops, best of 3: 398 ms per loop
但是,如果我把浮点数部分复制到另一个单独的数组中,搜索的速度就比 bisect
还要快:
b = a['f0'].copy()
%timeit b.searchsorted(400.)
1000000 loops, best of 3: 945 ns per loop
我有几个问题:
- 我是不是做错了什么,还是NumPy出现了问题?
- 有没有办法在不重复数据的情况下解决这个问题?
2 个回答
3
这里有一些代码,用来说明问题的大小(截至2015年5月11日)以及如何“修复”它。
import numpy as np
import bisect
import timeit
from random import randint
dtype = np.dtype([ ('pos','<I'),('sig','<H') ]) # my data is unsigned 32bit, and unsigned 16bit
data1 = np.fromfile('./all2/840d.0a9b45e8c5344abf6ac761017e93b5bb.2.1bp.binary', dtype)
dtype2 = np.dtype([('pos',np.uint32),('sig',np.uint32)]) # convert data to both unsigned 32bit
data2 = data1.astype(dtype2)
data3 = data2.view(('uint32', len(data2.dtype.names))) # convert above to a normal array (not structured array)
print data1.dtype.descr # [('pos', '<u4'), ('sig', '<u2')]
print data2.dtype.descr # [('pos', '<u4'), ('sig', '<u4')]
print data3.dtype.descr # [('', '<u4')]
print data1.nbytes # 50344494
print data2.nbytes # 67125992
print data3.nbytes # 67125992
print data1['pos'].max() # 2099257376
print data2['pos'].max() # 2099257376
print data3[:,0].max() # 2099257376
def b1(): return bisect.bisect_left(data1['pos'], randint(100000000,200000000))
def b2(): return bisect.bisect_left(data2['pos'], randint(100000000,200000000))
def b3(): return bisect.bisect_left(data3[:,0], randint(100000000,200000000))
def ss1(): return np.searchsorted(data1['pos'], randint(100000000,200000000))
def ss2(): return np.searchsorted(data2['pos'], randint(100000000,200000000))
def ss3(): return np.searchsorted(data3[:,0], randint(100000000,200000000))
def ricob1(): return bisect.bisect_left(data1['pos'], np.uint32(randint(100000000,200000000)))
def ricob2(): return bisect.bisect_left(data2['pos'], np.uint32(randint(100000000,200000000)))
def ricob3(): return bisect.bisect_left(data3[:,0], np.uint32(randint(100000000,200000000)))
def ricoss1(): return np.searchsorted(data1['pos'], np.uint32(randint(100000000,200000000)))
def ricoss2(): return np.searchsorted(data2['pos'], np.uint32(randint(100000000,200000000)))
def ricoss3(): return np.searchsorted(data3[:,0], np.uint32(randint(100000000,200000000)))
print timeit.timeit(b1,number=300) # 0.0085117816925
print timeit.timeit(b2,number=300) # 0.00826191902161
print timeit.timeit(b3,number=300) # 0.00828003883362
print timeit.timeit(ss1,number=300) # 6.57477498055
print timeit.timeit(ss2,number=300) # 5.95308589935
print timeit.timeit(ss3,number=300) # 5.92483091354
print timeit.timeit(ricob1,number=300) # 0.00120902061462
print timeit.timeit(ricob2,number=300) # 0.00120401382446
print timeit.timeit(ricob3,number=300) # 0.00120711326599
print timeit.timeit(ricoss1,number=300) # 4.39265394211
print timeit.timeit(ricoss2,number=300) # 0.00116586685181
print timeit.timeit(ricoss3,number=300) # 0.00108909606934
更新! 感谢Rico的评论,看来为你想要进行搜索的数字设置类型真的很重要! 不过,在使用32位和16位整数的结构化数组上,速度还是比较慢(虽然比之前快多了)
13
我记得之前看到过这个。如果我没记错的话,searchsorted在数据不连续的时候会临时复制一份数据。如果我有时间的话,稍后会去看看代码确认一下是不是这样(或者也许有谁更熟悉代码的人可以确认一下)。
在此期间,如果你不想重构代码来避免使用结构化数组,最好的办法可能是用 bisect_left(a['f0'], 400.)
。在我的机器上,这个方法比在连续数组上使用searchsorted慢8倍,但比在不连续数组上使用searchsorted快1000倍。
In [5]: a = np.arange((6e6)).view([('f0', float), ('f1', float)])
In [6]: timeit a['f0'].searchsorted(400.)
10 loops, best of 3: 51.1 ms per loop
In [7]: timeit a['f0'].copy()
10 loops, best of 3: 51 ms per loop
In [8]: timeit bisect_left(a['f0'], 400.)
10000 loops, best of 3: 52.8 us per loop
In [9]: f0 = a['f0'].copy()
In [10]: timeit f0.searchsorted(400.)
100000 loops, best of 3: 7.85 us per loop