光学（聚类）算法的Python实现

网友

1楼 · 编辑于 2024-05-17 00:19:42

我不知道一个完整的光学python实现。这里贴出的链接似乎只是对光学概念的粗略近似。它们也不使用索引来加速，因此它们将在O(n^2)中运行，甚至更有可能在O(n^3)中运行。

除了显而易见的想法外，光学还有许多棘手的问题。特别地，阈值被建议用E*>相对EEM>阈值（“席”）代替这里张贴的绝对阈值（在该点，结果将近似于dBSCAN！）。

原始光学文件包含一种将算法输出转换为实际簇的建议方法：

http://www.dbs.informatik.uni-muenchen.de/Publikationen/Papers/OPTICS.pdf

Weka中的光学实现基本上是未维护的。它实际上并不产生集群，它只计算集群顺序。为此，它复制了数据库-它不是真正的Weka代码。

在Java中，ELKI中似乎有一个相当广泛的实现，它是由最初发布OPTICS的小组提供的。您可能需要针对这个“官方”版本测试任何其他实现。

网友

2楼 · 编辑于 2024-05-17 00:19:42

虽然技术上不是光学的，但是在https://github.com/lmcinnes/hdbscan有一个针对python的HDBSCAN*实现。这相当于具有无限大epsilon的光学，以及不同的簇提取方法。由于实现提供了对生成的集群层次结构的访问，因此如果您愿意，也可以通过更传统的光学方法从中提取集群。

请注意，尽管没有限制epsilon参数，但该实现仍然使用kd树和基于ball树的最小生成树算法and can handle quite large datasets来实现O（n log（n））性能。

网友

3楼 · 编辑于 2024-05-17 00:19:42

编辑：众所周知，下面的不是是光学的完整实现。

我做了一个快速搜索，找到了以下（Optics）。我不能保证它的质量，但是算法看起来很简单，所以您应该能够快速地验证/调整它。

下面是一个如何在光学算法的输出上构建群集的快速示例：

def cluster(order, distance, points, threshold):
    ''' Given the output of the options algorithm,
    compute the clusters:

    @param order The order of the points
    @param distance The relative distances of the points
    @param points The actual points
    @param threshold The threshold value to cluster on
    @returns A list of cluster groups
    '''
    clusters = [[]]
    points   = sorted(zip(order, distance, points))
    splits   = ((v > threshold, p) for i,v,p in points)
    for iscluster, point in splits: 
        if iscluster: clusters[-1].append(point)
        elif len(clusters[-1]) > 0: clusters.append([])
    return clusters

    rd, cd, order = optics(points, 4)
    print cluster(order, rd, points, 38.0)

相关问题更多 >

编程相关推荐

热门问题

热门文章