Python scikit-DBSCAN ：错误的坐标或聚类

dataSet = [] fileIn = open('data') for line in fileIn.readlines(): lineArr = line.strip().split('\t') dataSet.append([float(lineArr[0]), float(lineArr[1])]) numSamples = len(dataSet) X = np.array(dataSet) X = StandardScaler().fit_transform(X)

clusters = [np.mean(X[labels == i],axis=0) for i in range(n_clusters_)] outliers = X[labels == 0] print(clusters) for i in range(n_clusters_): plt.plot(clusters[i],'*',markersize=20) unique_labels = set(labels) colors = [plt.cm.Spectral(each) for each in np.linspace(0, 1, len(unique_labels))] for k, col in zip(unique_labels, colors): if k == -1: # Black used for noise. col = [0, 0, 0, 1] class_member_mask = (labels == k) xy = X[class_member_mask & core_samples_mask] plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=tuple(col),markeredgecolor='k', markersize=14) xy = X[class_member_mask & ~core_samples_mask] plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=tuple(col),markeredgecolor='k', markersize=6) plt.title('Estimated number of clusters: %d' % n_clusters_) plt.show()

1条回答

网友

1楼 · 发布于 2024-04-16 23:38:29

显然需要为坐标系选择epsilon。如果缩放数据，epsilon将不再相同。一种最简单的方法是使用未标度的数据计算平均值。但是DBSCAN集群的方法无论如何都不可靠。在

从你的坐标轴来看，你可能需要把epsilon减少100倍。在

因为你的数据显然是坐标，你应该使用Haversine distance，因为地球是不平坦的，并根据对你的问题有意义的距离来选择epsilon。精确的缩放比例可能有点棘手。可能是弧度，所以需要将英里数转换为弧度来转换距离。在

相关问题更多 >

编程相关推荐

热门问题

热门文章