基于自定义成对距离函数的凝聚聚类

import numpy as np from scipy.optimize import linear_sum_assignment from scipy.cluster.hierarchy import dendrogram, linkage, ward from scipy.cluster.hierarchy import fcluster data = np.array([[[1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4]], [[5, 6], [7, 8], [5, 6], [7, 8], [5, 6], [7, 8], [5, 6], [7, 8], [5, 6], [7, 8]], [[1, 15], [3, 2], [1, 2], [5, 4], [1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4]], [[5, 1], [7, 8], [5, 6], [7, 1], [5, 6], [7, 8], [5, 1], [7, 8], [5, 6], [7, 8]]]) def wasserstein_distance_function(f1, f2): min_cost = np.inf f1 = f1.reshape((10, 2)) f2 = f2.reshape((10, 2)) for l in np.linspace(0.8, 1.2, 3): for k in np.linspace(0.8, 1.2, 3): cost = distance.cdist(l * f1, k * f2, 'sqeuclidean') row_ind, col_ind = linear_sum_assignment(cost) curr_cost = cost[row_ind, col_ind].sum() if curr_cost < min_cost: min_cost = curr_cost return min_cost def pairwise_wasserstein(points): """ Helper function to perform the pairwise distance function of all points within 'points' parameter """ for first_index in range(0,points.shape[0]): for second_index in range(first_index+1,points.shape[0]): print("First index: ", first_index, ", Second index: ", second_index, ", Distance: ",wasserstein_distance_function(points[first_index],points[second_index])) def find_clusters_formation(data): """ Method to find the clusters for the points array """ dist_mat = pairwise_wasserstein(data) Z = ward(dist_mat) cluster = fcluster(Z, 3, criterion='maxclust')

2条回答

网友

1楼 · 编辑于 2024-05-01 21:52:35

更新：

我可以通过将所有10个玩家的x和y坐标组合成[x1，y1，x2，y2，…，x10，y10]的[1，20]数组，然后按照上面的wasserstein_distance_函数对其进行重塑，从而使其工作

我还不能100%确定这是否有效，但第一个结果似乎很有希望（即适度平衡的集群）

网友

2楼 · 编辑于 2024-05-01 21:52:35

如果要使用预定义的度量，必须创建一个距离矩阵，它是对角线上有0的二次矩阵。当然，它的对角线上有零的原因是：点到自身的距离为零。然后将该矩阵作为参数传递给聚类算法的拟合预测函数

第一步-创建距离矩阵并计算数据点之间的距离：

distance_matrix = np.asarray([
    [wasserstein_distance_function(data[first_index], data[second_index]) 
         for first_index in range(len(data))] 
             for second_index in range(len(data))])

这将打印以下内容：

array([[  0.  , 100.8 ,  76.4 ,  96.32],
       [100.8 ,   0.  , 215.  ,  55.68],
       [ 76.4 , 215.  ,   0.  , 186.88],
       [ 96.32,  55.68, 186.88,   0.  ]])

第二步-根据需要使用参数填充聚类算法：

clusterer = AgglomerativeClustering(n_clusters=3, affinity="precomputed", linkage="average", distance_threshold=None)

第三步-提取标签：

clusterer.fit_predict(distance_matrix)

这张照片是：

array([2, 0, 1, 0], dtype=int64)

它实现了你想要的吗

相关问题更多 >

编程相关推荐

热门问题

热门文章