根据报告添加层次结构级别

2024-06-16 12:39:45 发布

您现在位置:Python中文网/ 问答频道 /正文

资料 df

child parent
b     a
c     a
d     b
e     c
f     c
g     f

输出:

child   parent  level
b       a       1
c       a       1
d       b       2
e       c       2
f       c       2
g       f       3

根据此父子报告,“a”是主要的父级,因为它不向任何人报告“b”和“c”向“a”报告,因此它们的级别为1。”“d”和“e”向1级(b、c)报告,因此它们为2级。”“g”向“f”(即2级)报告,因此“g”为3级。请让我知道如何做到这一点

我尝试下面的代码,但它不工作

df['Level'] = np.where(df['parent'] == 'a',"level 1",np.nan)
dfm1 = pd.Series(np.where(df['Level'] == 'level 1', df['parent'],None))
df.loc[df['parent'].isin(dfm1),'Level'] = "level 2"

Tags: 代码childdf报告npnan级别where
2条回答

以下是第一原则的解决方案:

# We will build the tree of relationships, using a helper node class
class Node:
    def __init__(self, value, parent=None, level=0):
        self.value = value
        self.parent = parent
        self.level = level
        self.children = []
    
    def set_child(self, child):
        child.level = self.level + 1
        self.children.append(child)

# Helper function to insert nodes
def insert(node, new_node):
    if new_node.parent == node.value:
        # if the new node is a child, insert it
        node.set_child(new_node)
    else:
        # otherwise, iterate over the children until you find its parent
        if node.children:
            for child in node.children:
                insert(child, new_node)

# gather the level information for the tree
def node_print(node, values=[]):
    if node.parent:
        values.append((node.value, node.parent, node.level))
    for child in node.children:
        values = node_print(child, values=values)
    return values

# Now get the data and build the tree
data = """b     a
c     a
d     b
e     c
f     c
g     f"""


rows = [y.split() for y in data.split("\n")]

for index, (child, parent) in enumerate(rows):
    if index == 0:
        node = Node(value=parent)
    
    child_node = Node(value=child, parent=parent)
    insert(node, child_node)

output = pd.DataFrame(data=node_print(node, values=[]), columns=['child', 'parent', 'level']).sort_values(by='level')

print(output)

  child parent  level
0     b      a      1
2     c      a      1
1     d      b      2
3     e      c      2
4     f      c      2
5     g      f      3


这里有一种使用^{}的方法,在这里我们可以找不到任何祖先,并得到相同的长度

import networkx as nx

G = nx.from_pandas_edgelist(df,"parent","child",create_using=nx.DiGraph())
f = lambda x: len(nx.ancestors(G,x))
df['level'] = df['child'].map(f)

print(df)

  child parent  level
0     b      a      1
1     c      a      1
2     d      b      2
3     e      c      2
4     f      c      2
5     g      f      3

相关问题 更多 >