<p><strong>方法1:</strong>矢量化方法-</p>
<pre><code>def vectorized_app(embed_vec, R):
tot_vec = embed_vec.shape[0]
r,c = np.triu_indices(tot_vec,1)
subs = embed_vec[r] - embed_vec[c]
dists = np.einsum('ij,ij->i',subs,subs)
return np.bincount(r,dists<R**2,minlength=tot_vec)
</code></pre>
<p><strong>方法2:</strong>循环复杂度较低(对于非常大的阵列)-</p>
^{pr2}$
<hr/>
<h2>标杆管理</h2>
<p>原始方法-</p>
<pre><code>def loopy_app(embed_vec, R):
tot_vec = embed_vec.shape[0]
p = np.zeros(tot_vec) # This contains the number of close vectors
for i in range(tot_vec-1):
for j in range(i+1, tot_vec):
if np.linalg.norm(embed_vec[i]-embed_vec[j]) < R:
p[i] += 1
return p
</code></pre>
<p>时间安排-</p>
<pre><code>In [76]: # Sample random array
...: embed_vec = np.random.rand(3000,3)
...: R = 0.5
...:
In [77]: %timeit loopy_app(embed_vec, R)
1 loops, best of 3: 50.5 s per loop
In [78]: %timeit loopy_less_app(embed_vec, R)
10 loops, best of 3: 143 ms per loop
</code></pre>
<p><strong><code>350x+</code></strong>加速!在</p>
<p>使用更大的数组和建议的<code>loopy_less_app</code>-</p>
<pre><code>In [81]: # Sample random array
...: embed_vec = np.random.rand(20000,3)
...: R = 0.5
...:
In [82]: %timeit loopy_less_app(embed_vec, R)
1 loops, best of 3: 4.47 s per loop
</code></pre>