在Sankey图中垂直定位节点以避免与链接冲突

2024-05-19 18:41:24 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图用Plotly绘制一个Sankey plot,它将某些文档过滤到范围内或范围外,即1个源,2个目标,但是有些文档在步骤1中被过滤,有些文档在步骤2中被过滤等等。这导致了以下Sankey图:

Current output

现在我最理想的是它看起来像这样:

Ideal output

我已经试着浏览了关于:https://plot.ly/python/reference/#sankey的文档,但是我没有找到我想要的,理想情况下,我希望实现一个功能,以防止绘图重叠节点和链接。在

这是我使用的生成绘图对象的代码:

def genSankeyPlotObject(df, cat_cols=[], value_cols='', visible = False):

    ### COLORPLATTE TO USE
    colorPalette = ['472d3c', '5e3643', '7a444a', 'a05b53', 'bf7958', 'eea160', 'f4cca1', 'b6d53c', '71aa34', '397b44',
                    '3c5956', '302c2e', '5a5353', '7d7071', 'a0938e', 'cfc6b8', 'dff6f5', '8aebf1', '28ccdf', '3978a8',
                    '394778', '39314b', '564064', '8e478c', 'cd6093', 'ffaeb6', 'f4b41b', 'f47e1b', 'e6482e', 'a93b3b',
                    '827094', '4f546b']

    ### CREATES LABELLIST FROM DEFINED COLUMNS
    labelList = []
    for catCol in cat_cols:
        labelListTemp = list(set(df[catCol].values))
        labelList = labelList + labelListTemp
    labelList = list(dict.fromkeys(labelList))

    ### DEFINES THE NUMBER OF COLORS IN THE COLORPALLET
    colorNum = len(df[cat_cols[0]].unique()) + len(df[cat_cols[1]].unique()) + len(df[cat_cols[2]].unique())
    TempcolorPallet = colorPalette * math.ceil(len(colorPalette)/colorNum)
    shuffle(TempcolorPallet)
    colorList = TempcolorPallet[0:colorNum]

    ### TRANSFORMS DF INTO SOURCE -> TARGET PAIRS
    for i in range(len(cat_cols)-1):
        if i==0:
            sourceTargetDf = df[[cat_cols[i],cat_cols[i+1],value_cols]]
            sourceTargetDf.columns = ['source','target','count']
        else:
            tempDf = df[[cat_cols[i],cat_cols[i+1],value_cols]]
            tempDf.columns = ['source','target','count']
            sourceTargetDf = pd.concat([sourceTargetDf,tempDf])
        sourceTargetDf = sourceTargetDf.groupby(['source','target']).agg({'count':'sum'}).reset_index()

    ### ADDING INDEX TO SOURCE -> TARGET PAIRS
    sourceTargetDf['sourceID'] = sourceTargetDf['source'].apply(lambda x: labelList.index(x))
    sourceTargetDf['targetID'] = sourceTargetDf['target'].apply(lambda x: labelList.index(x))

    ### CREATES THE SANKEY PLOT OBJECT
    data = go.Sankey(node = dict(pad = 15,
                                 thickness = 20,
                                 line = dict(color = "black",
                                             width = 0.5),
                                 label = labelList,
                                 color = colorList),
                     link = dict(source = sourceTargetDf['sourceID'],
                                 target = sourceTargetDf['targetID'],
                                 value = sourceTargetDf['count']),
                     valuesuffix = ' ' + value_cols,
                     visible = visible)

    return data

Tags: 文档sourcetargetdflenvaluecountdict