解释HDBSCAN集群的行为

1条回答

网友

1楼 · 发布于 2024-04-25 11:50:07

如help page中所述，hdbscan的核心是1）计算相互可达距离和2）应用单连杆算法。由于您没有那么多的数据点，并且距离度量是预先计算的，因此可以看到您的群集由单个链接决定：

import numpy as np
import hdbscan
import matplotlib.pyplot as plt
import seaborn as sns

x = np.array([[0.0, 0.741, 0.344, 1.0, 0.062, 0.084],
 [0.741, 0.0, 0.648, 0.592, 0.678, 0.657],
 [0.344, 0.648, 0.0, 0.648, 0.282, 0.261],
 [1.0, 0.592, 0.655, 0.0, 0.937, 0.916],
 [0.062, 0.678, 0.282, 0.937, 0.0, 0.107],
 [0.084, 0.65, 0.261, 0.916, 0.107, 0.0]])

clusterer = hdbscan.HDBSCAN(min_cluster_size=2,min_samples=1,
                            metric='precomputed').fit(x)
clusterer.single_linkage_tree_.plot(cmap='viridis', colorbar=True)

结果将是：

clusterer.labels_

[0 1 0 1 0 0]

因为集群的最小数量必须是2。所以实现这一点的唯一方法是将元素0,2,4,5放在一起

一个快速的解决方案是简单地切割树并获得您想要的集群：

clusterer.single_linkage_tree_.get_clusters(0.15, min_cluster_size=2)

[ 0 -1 -1 -1  0  0]

或者您只需使用sklearn.cluster.aggregativeclustering中的内容，因为您不依赖hdbscan来计算距离度量

相关问题更多 >

编程相关推荐

热门问题

热门文章

解释HDBSCAN集群的行为

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >