如何将pandas DataFrame转换为Newick格式的字典

2024-04-27 15:40:36 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下数据集:

import pandas as pd
df = pd.DataFrame([['root', 'b', 'a', 'leaf1'],
                   ['root', 'b', 'a', 'leaf2'],
                   ['root', 'b', 'leaf3', ''],
                   ['root', 'b', 'leaf4', ''],
                   ['root', 'c', 'leaf5', ''],
                   ['root', 'c', 'leaf6', '']],
                   columns=['col1', 'col2', 'col3', 'col4'])

因为我没有找到直接将其转换为Newic格式的方法,所以我想将其转换为以下格式的词典:

node_to_children = {
    'root': {'b': 0, 'c': 0},
    'a': {'leaf1': 0, 'leaf2': 0},
    'b': {'a': 0, 'leaf3': 0, 'leaf4': 0},
    'c': {'leaf5': 0, 'leaf6': 0}
}

然后,我可以最终将这个节点_转换为_子节点到Newic格式,但是,如何才能将pandas数据帧转换为dictionary


Tags: 数据importpandas节点as格式rootpd
1条回答
网友
1楼 · 发布于 2024-04-27 15:40:36

我假设数据帧中的每一行代表树从根到叶的一个完整分支。基于此,我提出了以下解决方案。在下面的代码中可以找到对算法中每个步骤的注释,但是如果有任何不清楚的地方,请随时询问

node_to_children = {}

#iterate over dataframe row-wise. Assuming that every row stands for one complete branch of the tree
for row in df.itertuples():
    #remove index at position 0 and elements that contain no child ("")
    row_list = [element for element in row[1:] if element != ""]
    for i in range(len(row_list)-1):
        if row_list[i] in node_to_children.keys():
            #parent entry already existing 
            if row_list[i+1] in node_to_children[row_list[i]].keys():
                #entry itself already existing  > next
                continue
            else:
                #entry not existing  > update dict and add the connection
                node_to_children[row_list[i]].update({row_list[i+1]:0})
        else:
            #add the branching point
            node_to_children[row_list[i]] = {row_list[i+1]:0}
   

输出:

print(node_to_children)
        
{'root': {'b': 0, 'c': 0}, 
 'b': {'a': 0, 'leaf3': 0, 'leaf4': 0}, 
 'a': {'leaf1': 0, 'leaf2': 0}, 
 'c': {'leaf5': 0, 'leaf6': 0}}

相关问题 更多 >