我有一个函数,它以2个0和1的数组作为输入,每个数组大约8000个元素。我的函数eps计算这些数组的统计信息并返回输出。只需检查0并注意数组中0所在的索引,操作就很简单了。我尽了最大努力优化速度,但使用timeit库,我能得到的最好结果是4.5~5秒(对于18k阵列对)。时间很重要,因为我需要在数十亿个数组对上运行此函数
#e.g. inputs
#ts_1 = [0,1,1,0,0,1,1,0,......]
#ts_2 = [1,1,1,1,1,1,1,0,......]
# tau = any integer or float
def eps(ts_1, ts_2, tau):
n1 = 0
n2 = 0
Q_tau = 0
q_tau = 0
event_index1 = [index for index, item in enumerate(ts_1) if item == 0]
n1 = ts_1.count(0)
event_index2 = [index for index, item in enumerate(ts_2) if item == 0]
n2 = ts_2.count(0)
# tried numpy based on @Ram comment below, no improvement
event_index1, = np.where(np.array(ts_1) == 0)
n1 = event_index1.shape[0]
event_index2, = np.where(np.array(ts_2) == 0)
n2 = event_index2.shape[0]
# tried numpy based on @Ram comment below, no improvement
if (n1 == 0 or n2 == 0):
Q_tau = 0
else:
c_ij = 0
matching_idx = set(event_index1).intersection(event_index2)
c_ij = c_ij + (0.5 *len(matching_idx) )
for x,y in product(event_index1,event_index2):
if x-y > 0 and (x-y)<= tau:
c_ij = c_ij +1
c_ji = 0
matching_idx_2 = set(event_index2).intersection(event_index1)
c_ji = c_ji + (0.5 *len(matching_idx_2) )
for x,y in product(event_index2,event_index1):
if x-y > 0 and (x-y)<= tau:
c_ji = c_ji +1
Q_tau = (c_ij+c_ji)/math.sqrt( n1 * n2 )
q_tau = (c_ij - c_ji)/math.sqrt( n1 * n2 )
return Q_tau, q_tau
根据我前面的评论,并考虑到在一个产品中排列两个列表会得到相同的倒排元组,您可以将代码缩减为:
相关问题 更多 >
编程相关推荐