Pandas:从两列中确定类别树的id

2024-06-16 11:14:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个目录树设置如下。顶层由父级_id=-1定义(在本例中,我在顶层有两个节点(即线性资源和点资源)

asset_tree = [
    {'id': 1, 'name': 'Linear Asset', 'parent_id': -1},
    {'id': 2, 'name': 'Lateral', 'parent_id': 1},
    {'id': 3, 'name': 'Main', 'parent_id': 1},
    {'id': 4, 'name': 'Point Asset', 'parent_id': -1},
    {'id': 5, 'name': 'Fountain', 'parent_id': 4},
    {'id': 6, 'name': 'Hydrant', 'parent_id': 4}
]

我还有一个资产数据框架,定义如下:

import pandas as pd

df = pd.DataFrame({
    'name': ['pipe_1','pipe_2','pipe_3','hydrant_1', 'hydrant_2', 'fountain_1', 'fountain_2'],
    'level_1': ['Linear Asset','Linear Asset','Linear Asset','Point Asset','Point Asset','Point Asset','Point Asset'],
    'level_2': ['Main','Lateral','Lateral','Hydrant','Hydrant','Fountain','Fountain']
})

因此,数据帧如下所示:

         name       level_1   level_2
0      pipe_1  Linear Asset      Main
1      pipe_2  Linear Asset   Lateral
2      pipe_3  Linear Asset   Lateral
3   hydrant_1   Point Asset   Hydrant
4   hydrant_2   Point Asset   Hydrant
5  fountain_1   Point Asset  Fountain
6  fountain_2   Point Asset  Fountain

我想要一个函数来查找树的最低级别的id(在示例中为level_2)。对于我的示例代码,我的dataframe输出如下。此外,如果我有1、2或3个级别,我希望该函数可以工作

         name       level_1   level_2 tree_id
0      pipe_1  Linear Asset      Main       3
1      pipe_2  Linear Asset   Lateral       2
2      pipe_3  Linear Asset   Lateral       2
3   hydrant_1   Point Asset   Hydrant       6
4   hydrant_2   Point Asset   Hydrant       6
5  fountain_1   Point Asset  Fountain       5
6  fountain_2   Point Asset  Fountain       5

我想到了以下功能,但有几个问题:

  • 它不起作用(它给我的是级别1 id,而不是级别2)
  • 理想情况下,如果我有三个级别(即级别_1、级别_2和级别_3),而不是只有两个,那么它将使用更复杂的树
def find_tree_id(branches, tree):
    tree_id = None
    number_of_branches = len(branches)
    parent_id = -99
    for i in range(0,number_of_branches):
        for j in range(0,len(tree)):
            if i == 0 and branches[i] == tree[j]['name']:
                parent_id = tree[j]['parent_id']
                tree_id = tree[j]['id']
            if parent_id == -1:
                return tree_id
    return tree_id

tree_ids = []
for i, row in df.iterrows():
    tree_id = find_tree_id([row['level_1'], row['level_2']], asset_tree)
    tree_ids.append(tree_id)
df['tree_id'] = tree_ids
print(df)

错误输出为:

         name       level_1   level_2  tree_id
0      pipe_1  Linear Asset      Main        1
1      pipe_2  Linear Asset   Lateral        1
2      pipe_3  Linear Asset   Lateral        1
3   hydrant_1   Point Asset   Hydrant        4
4   hydrant_2   Point Asset   Hydrant        4
5  fountain_1   Point Asset  Fountain        4
6  fountain_2   Point Asset  Fountain        4

Tags: nameidtreeasset级别levelparentpoint
2条回答

这是我想出的解决办法。我相信它比迄今为止给出的答案更可靠,但并不完美

我将树转换为数据帧,并编写了一个递归函数来派生连接的名称。recurson允许在需要时提供更多级别的灵活性。我将这些名称添加为一个名为“flat_levels”的列

def get_flatname_from_tree(tree_df):
    return tree_df['id'].apply(_flatname_recurse, df=tree_df)

def _flatname_recurse(ID, df):
    row = df.iloc[ID - 1]
    if row['parent_id'] == -1:
        return row['name']
    else:
        return _flatname_recurse(row['parent_id'], df=df) + ' : ' + row['name']

tree = pd.DataFrame(asset_tree)
tree['flat_levels'] = get_flatname_from_tree(tree)

然后我编写了一个函数来连接df中的级别

def concatentate_levels(x, num_levels):
    all_levels = ""
    for i in range(1,num_levels+1):
        all_levels = all_levels + x['level_' + str(i)] + " : "
    all_levels = all_levels[:-3]  # removes the trailing colon
    return all_levels

n_levels = len(df.filter(like='level_').columns)
df['flat_levels'] = df.apply(concatentate_levels, args=[n_levels], axis=1)

最后,我将两个数据帧合并到“flat_levels”列中

df = pd.merge(df, tree[["id","flat_levels"]], on="flat_levels")

您可以首先将asset_tree转换为嵌套字典,存储级别之间的关系。这样,您就可以使用递归生成器函数,该函数接收级别行并遍历新树,使用行中的名称获取级别中最右侧名称的id:

import pandas as pd
asset_tree = [{'id': 1, 'name': 'Linear Asset', 'parent_id': -1}, {'id': 2, 'name': 'Lateral', 'parent_id': 1}, {'id': 3, 'name': 'Main', 'parent_id': 1}, {'id': 4, 'name': 'Point Asset', 'parent_id': -1}, {'id': 5, 'name': 'Fountain', 'parent_id': 4}, {'id': 6, 'name': 'Hydrant', 'parent_id': 4}]
a_tree = {i['id']:i for i in asset_tree}
assets = {'name': ['pipe_1', 'pipe_2', 'pipe_3', 'hydrant_1', 'hydrant_2', 'fountain_1', 'fountain_2'], 'level_1': ['Linear Asset', 'Linear Asset', 'Linear Asset', 'Point Asset', 'Point Asset', 'Point Asset', 'Point Asset'], 'level_2': ['Main', 'Lateral', 'Lateral', 'Hydrant', 'Hydrant', 'Fountain', 'Fountain']}
def to_dict(_id):
  return {i['id']:to_dict(i['id']) for i in asset_tree if i['parent_id'] == _id}

new_tree = {i['id']:to_dict(i['id']) for i in asset_tree if i['parent_id'] == -1}
def get_id(d, row, c = []):
   if not row:
      yield c
   else:
      for a, b in d.items():
         if row[0] == a_tree[a]['name']:
            yield from get_id(b, row[1:], c+[a])

result = [dict(zip([*assets, 'tree_id'], [a, *b, next(get_id(new_tree, b))[-1]])) 
          for a, *b in zip(*assets.values())]
df = pd.DataFrame(result)

输出:

         name       level_1   level_2  tree_id
0      pipe_1  Linear Asset      Main        3
1      pipe_2  Linear Asset   Lateral        2
2      pipe_3  Linear Asset   Lateral        2
3   hydrant_1   Point Asset   Hydrant        6
4   hydrant_2   Point Asset   Hydrant        6
5  fountain_1   Point Asset  Fountain        5
6  fountain_2   Point Asset  Fountain        5

相关问题 更多 >