哪种聚类距离度量可以找到最相关的项目组

+----------+------------+---------+----------+ | Location | Units Sold | Revenue | Footfall | +----------+------------+---------+----------+ | Loc - 01 | 100 | 1,150 | 85 | | Loc - 02 | 100 | 1,250 | 60 | | Loc - 03 | 90 | 990 | 90 | | Loc - 04 | 120 | 1,200 | 98 | | Loc - 05 | 115 | 1,035 | 87 | | Loc - 06 | 89 | 1,157 | 74 | | Loc - 07 | 110 | 1,265 | 80 | +----------+------------+---------+----------+

1条回答

网友

1楼 · 发布于 2024-04-20 14:36:01

首先，将dataframe的索引设置为Location列以便于索引

df1 = df1.set_index('Location')

接下来，生成要比较的所有餐厅组合：

import itertools
pairs = list(itertools.combinations(df1.index.values, 2))

接下来，定义一个比较函数。让我们使用上一篇文章中使用的

import numpy as np
def compare_function(row1, row2):
    return np.sqrt((row1['Units Sold']-row2['Units Sold'])**2 + 
           (row1['Revenue']- row2['Revenue'])**2 + 
           (row1['Footfall']- row2.loc[0, 'Footfall'])**2)

接下来，迭代所有对，得到比较函数的结果：

results = [(row1, row2, compare_function(df1.loc[row1], df1.loc[row2]))
      for row1, row2 in pairs]

现在您有了一个列表，其中列出了所有成对的休息室以及它们之间的距离。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章