如何从Pandas数据框生成完整层次图?
有一个Pandas的数据框,里面存放着一些层级数据,也就是有上下级关系的数据。
data = pd.DataFrame({
"manager_id": ["A", "A", "B", "A", "C", "A", "B"],
"employee_id": ["B", "C", "C", "D", "E", "E", "E"]
} )
这些数据包含了每个经理的所有下属关系。比如说,对于每个经理的ID(比如“ A”),员工的ID包括了直接由经理“A”管理的员工(比如“ B”),以及由员工“ B”管理的员工(比如“ C”、“ D”)。那么,如何把这些关系表示成一个有向图呢?这个图的边应该是
[('A', 'B'), ('A', 'D'), ('B', 'C'), ('C', 'E')]
2 个回答
1
一开始我没有完全明白你想要实现什么,现在我已经想出了一个循环的方法来做到这一点:
import networkx as nx
import pandas as pd
man_id ='A'
NodeCons=[]
employees =[]
data = pd.DataFrame({
"manager_id": ["A", "A", "B", "A", "C", "A", "B"],
"employee_id": ["B", "C", "C", "D", "E", "E", "E"]
} )
def searchEmployees(man):
employees =[]
for index,row in data.iterrows():
if row['manager_id'] == man:
NodeCons.append((row['manager_id'],row['employee_id']))
employees.append(row['employee_id'])
searchEmployees(row['employee_id'])
searchEmployees(man_id)
然后从这个列表中创建一个图:
G = nx.Graph()
G.add_edges_from(NodeCons)
nx.draw_networkx(G)
2
首先,把数据框(DataFrame)当作一个有向图来读取,可以使用from_pandas_edgelist
这个方法。接着,用topological_sort
来对节点进行排序,只保留每个子节点中,父节点具有最高拓扑索引的边。
G = nx.from_pandas_edgelist(data, source='manager_id', target='employee_id',
create_using=nx.DiGraph)
# topological order
order = {n: i for i,n in enumerate(nx.topological_sort(G))}
# {'A': 0, 'B': 1, 'D': 2, 'C': 3, 'E': 4}
# for each node, only keep the parent that has the greatest topological index
parents = {}
for source, target in G.edges():
p = parents.setdefault(target, source)
if order[source] > order[p]:
parents[target] = source
# parents
# {'C': 'B', 'E': 'C', 'B': 'A', 'D': 'A'}
# remove shorter edges
G.remove_edges_from(G.edges - set(zip(parents.values(), parents.keys())))
# or
# G = G.edge_subgraph(list(zip(parents.values(), parents.keys())))
输出结果:
过滤前的图:
变体
你也可以使用topological_generations
来计算一个顺序,这样同一代的节点会有相同的编号。
order = {n: i for i, l in enumerate(nx.topological_generations(G)) for n in l}
# {'A': 0, 'B': 1, 'D': 1, 'C': 2, 'E': 3}
关于你提的另一个问题,你也可以计算相对代际差异,只保留两个节点之间代际差异为1的边。
G = nx.from_pandas_edgelist(data, source='manager_id', target='employee_id',
create_using=nx.DiGraph)
order = {n: i for i, l in enumerate(nx.topological_generations(G)) for n in l}
# {'A': 0, 'B': 1, 'D': 1, 'C': 2, 'E': 3}
# compute the relative generation difference
# keep the edges with a difference of 1
keep = [e for e in G.edges if order[e[1]]-order[e[0]] == 1]
# [('A', 'B'), ('A', 'D'), ('B', 'C'), ('C', 'E'), ('F', 'G')]
G = G.edge_subgraph(list(zip(parents.values(), parents.keys())))