计算N个样本和簇质心之间的平方欧几里德距离最有效的方法是什么？

def dist_2(X,y): X_square_sum = np.sum(np.square(X), axis = 1) y_square_sum = np.sum(np.square(y), axis = 1) dot_xy = np.dot(X, y.T) X_square_sum_tile = np.tile(X_square_sum.reshape(-1, 1), (1, y.shape[0])) y_square_sum_tile = np.tile(y_square_sum.reshape(1, -1), (X.shape[0], 1)) dist = X_square_sum_tile + y_square_sum_tile - (2 * dot_xy) return dist dist = dist_2(X, y)

1条回答

网友

1楼 · 发布于 2024-04-19 07:21:00

这个问题经常与nereast邻居搜索结合使用。如果是这种情况，请看一看kdtree approach。这将比计算欧几里得距离更有效，无论是在内存消耗还是性能方面。在

如果不是这样，这里有一些可能的方法。前两个来自an answer of Divakar。第三种方法使用Numba进行jit编译。两个版本的临时避让是第一个数组的主要区别。在

计算欧几里得距离的三种方法

import numpy as np
import numba as nb

# @Paul Panzer
#https://stackoverflow.com/a/42994680/4045774
def outer_sum_dot_app(A,B):
    return np.add.outer((A*A).sum(axis=-1), (B*B).sum(axis=-1)) - 2*np.dot(A,B.T)

# @Divakar
#https://stackoverflow.com/a/42994680/4045774
def outer_einsum_dot_app(A,B):
    return np.einsum('ij,ij->i',A,A)[:,None] + np.einsum('ij,ij->i',B,B) - 2*np.dot(A,B.T)

@nb.njit()
def calc_dist(A,B,sqrt=False):
  dist=np.dot(A,B.T)

  TMP_A=np.empty(A.shape[0],dtype=A.dtype)
  for i in range(A.shape[0]):
    sum=0.
    for j in range(A.shape[1]):
      sum+=A[i,j]**2
    TMP_A[i]=sum

  TMP_B=np.empty(B.shape[0],dtype=A.dtype)
  for i in range(B.shape[0]):
    sum=0.
    for j in range(B.shape[1]):
      sum+=B[i,j]**2
    TMP_B[i]=sum

  if sqrt==True:
    for i in range(A.shape[0]):
      for j in range(B.shape[0]):
        dist[i,j]=np.sqrt(-2.*dist[i,j]+TMP_A[i]+TMP_B[j])
  else:
    for i in range(A.shape[0]):
      for j in range(B.shape[0]):
        dist[i,j]=-2.*dist[i,j]+TMP_A[i]+TMP_B[j]
  return dist

计时

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章