基于阈值的聚类

2024-04-19 06:06:58 发布

您现在位置:Python中文网/ 问答频道 /正文

编辑(简化)

我很肯定,我错过了正确的术语“谷歌”这个问题,请给我指出,如果它已经被问过: 我有一个树结构,假设如下

(0)->(0,0:7)
     (0,1:9)
(1)->(1,0:6)
     (1,1:2)
     (1,2:1)

为了简单起见,让我们将其转换为一个平面结构

l1, l2, v1
0, 0, 7
0, 1, 9
1, 0, 6
1, 1, 2
1, 2, 1

现在让我们在这个树上设置一个阈值3。这意味着我们要保留高于阈值的节点,并合并分支中低于阈值的所有节点。你知道吗

因此,我们最终得到的结果是,最后两行(因为它们低于临界值)最终得到一张票据,作为两行的总和“产生”:

l1, l2, v1
0, 0, 7
0, 1, 9
1, 0, 6
1, (1,2),  3

最好是用python解决方案。显然,我很乐意处理一些边缘条件。请注意,在现实中,我可以结束与树是6深。你知道吗


Tags: 编辑l1节点分支阈值解决方案结构平面
1条回答
网友
1楼 · 发布于 2024-04-19 06:06:58

所以我最终还是按照我之前暗示过的笨拙的方式做了。 我首先在节点定义中添加了一些标志(dodelete=Falsevisited=False)。你知道吗

并将add_node方法更新为

def add_child(self, node):
    node.parent = self
    node.level = self.level + 1
    self.children.append(node)
    return node

其中self.children是节点列表

然后是两种方法

def collapse_nodes(tree, thresh=3):
    for n in tree:
        if n.dodelete:
            continue
        sub_tree = tree.get_by_path (n.id)
        sub_tree_stack = []
        for child in sub_tree.children:
           if child.val is not None and thresh > child.val:
               sub_tree_stack.append(child)
               tree.get_by_path(child.id).dodelete = True
           if sub_tree_stack:
              sub_tree.add_child(Node(",".join([subnode.name for subnode in sub_tree_stack]), 
                                 val = sum([subnode.val for subnode in sub_tree_stack]), 
                                 id= sorted([subnode.id for subnode in sub_tree_stack])[0]))
    return tree

以及

def roll_up(tree, thresh = 2, level=5):
    for n in tree:
        if n.dodelete or n.visited:
            continue
        if n.level != level:
            continue
        sub_tree = tree.get_by_path(n.id)
        if sub_tree is None:
            continue
        sub_tree_stack = []
        for child in tree.get_by_path(n.id).children:
            if child.val is not None and child.val <= thresh:
                sub_tree_stack.append((child.name, child.id, child.val))
                # Also mark this for deletion
                tree.get_by_path(child.id).dodelete = True
        if sub_tree_stack:
            # Get the parent for these nodes
            node_name = n.name + ": [" + ",".join([subnode[0] for subnode in sub_tree_stack]) + "]"
            node_val = sum([subnode[2]for subnode in sub_tree_stack])
            node_id = sorted([subnode[1] for subnode in sub_tree_stack])[0]
            parent_name = n.parent.name
            parent_level = n.parent.level
            parent_id= n.parent.id
            # Now ensure that you delete the old node before adding new
            tree.get_by_path(n.id).dodelete = True

            tree.get_by_path(parent_id).add_child(Node(node_name, val=node_val, id = n.id, visited=True) )
    return tree

一个有点复杂的方法,但工作。我创造了一个要点https://gist.github.com/fahaddaniyal/0dc86c80f266fd9f8cdb供任何任性的灵魂尝试检验

相关问题 更多 >