直观地显示列表之间的相似性最理想的方法是什么？

# instance 1 I1 = [['cat', 'dog', 'bob'], # 1st second ['eel', 'pug', 'emu'], # 2nd second ['owl', 'yak', 'elk']] # 3rd second # instance 2 I2 = [['dog', 'fox', 'rat'], # 1st second ['emu', 'pug', 'ram'], # 2nd second ['bug', 'bee', 'bob']] # 3rd second # instance 3 I3 = [['cat', 'bob', 'fox'], # 1st second ['emu', 'pug', 'eel'], # 2nd second ['bob', 'bee', 'yak']] # 3rd second

1条回答

网友
1楼 · 发布于 2024-05-16 23:31:29

您可以遍历并创建自己的相似性矩阵，并使用matplotlib的imshow函数来绘制矩阵。对于这种方法，它将是跨秒的完全相似性，否则您将需要一个三维相似性矩阵。这是可行的，使用下面的代码，但是您需要找到另一种方法来可视化它，而不是imshow
import numpy as np import matplotlib.pyplot as plt # instance 1 I1 = [['cat', 'dog', 'bob'], # 1st second ['eel', 'pug', 'emu'], # 2nd second ['owl', 'yak', 'elk']] # 3rd second # instance 2 I2 = [['dog', 'fox', 'rat'], # 1st second ['emu', 'pug', 'ram'], # 2nd second ['bug', 'bee', 'bob']] # 3rd second # instance 3 I3 = [['cat', 'bob', 'fox'], # 1st second ['emu', 'pug', 'eel'], # 2nd second ['bob', 'bee', 'yak']] # 3rd second total = [I1, I2, I3] # initialize similarity matrix by number of instances you have sim_matrix = np.zeros(shape=(len(total), len(total))) # constant per your explanation N = 3 # for each row in sim matrix for i in range(len(total)): # for each column in sim matrix for j in range(len(total)): # if comparing itself if i == j: # similarity is total # of strings across all seconds (may not be constant) sim_matrix[i, j] = sum([len(t) for t in total[i]]) else: # sum up each set intersection of each list of strings at each second sim_matrix[i, j] = sum([len(list(set(total[i][s]) & set(total[j][s]))) for s in range(N)])
sim_matrix应该是
array([[9., 3., 6.], [3., 9., 5.], [6., 5., 9.]])
您可以使用imshow来绘制它
plt.imshow(sim_matrix) plt.colorbar() plt.show()
几乎可以肯定的是，有更好、更有效的方法可以做到这一点，但是如果您的列表数量较少，那么这可能就没问题了
编辑
如果您需要在每一秒的相似性矩阵，您可以使用以下修改后的代码
sim_matrix = np.zeros(shape=(len(total), len(total), len(total))) for i in range(len(total)): for j in range(len(total)): if i == j: sim_matrix[:, i, j] = [len(t) for t in total[i]] else: sim_matrix[:, i, j] = [len(list(set(total[i][s]) & set(total[j][s]))) for s in range(N)]
您可以使用imshow仍然可视化三维相似矩阵，但它会将每个切片解释为RBG颜色通道

编辑

相关问题更多 >

编程相关推荐

热门问题

热门文章