从对列表中生成Numpy对称矩阵

2024-04-19 13:15:27 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个相关矩阵,但指定为成对,如:

cm = pd.DataFrame({'name1': ['A', 'A', 'B'], 
                   'name2': ['B', 'C', 'C'], 
                   'corr': [0.1, 0.2, 0.3]})
cm
    name1   name2   corr
0   A       B       0.1
1   A       C       0.2
2   B       C       0.3

将其转换为numpy 2d阵列相关矩阵的最简单方法是什么

    A   B   C
A 1.0 0.1 0.2
B 0.1 1.0 0.3
C 0.2 0.3 1.0

Tags: 方法numpydataframecmpdname1corrname2
3条回答

假设最后一列以适当的方式排序,我们可以使用以下代码

import pandas as pd
import numpy as np

# define data frame
data = pd.DataFrame({
    'name1': ['A', 'A', 'B'],
    'name2': ['B', 'C', 'C'],
    'correlation': [0.1, 0.2, 0.3]})

# get correlation column and dimension
correlation = data['correlation'].values
dimension = correlation.shape[0]

# define empty matrix to fill and unit matrix
matrix_upper_triangular = np.zeros((dimension, dimension))

# fill upper triangular matrix with one half at diagonal
counter = 0
for (row, column), element in np.ndenumerate(matrix_upper_triangular):
    # half of diagonal terms
    if row == column:
        matrix_upper_triangular[row, column] = 0.5
    # upper triangular values
    elif row < column:
        matrix_upper_triangular[row, column] = correlation[counter]
        counter = counter + 1
    else:
        pass

# add upper triangular + lower triangular matrix
correlation_matrix = matrix_upper_triangular
correlation_matrix += matrix_upper_triangular.transpose()

不确定pure numpy,因为您正在处理一个数据帧。下面是一个纯熊猫解决方案:

s = cm.pivot(*cm)

ret = s.add(s.T, fill_value=0).fillna(1)

输出:

     A    B    C
A  1.0  0.1  0.2
B  0.1  1.0  0.3
C  0.2  0.3  1.0

Extra:对于反向(ret如上所述)

(ret.where(np.triu(np.ones(ret.shape, dtype=bool),1))
    .stack()
    .reset_index(name='corr')
)

输出:

  level_0 level_1  corr
0       A       B   0.1
1       A       C   0.2
2       B       C   0.3

一种方法是使用networkX构建图形,将corr列设置为边weight,并使用^{}获取adjacency matrix

import networkx as nx
G = nx.from_pandas_edgelist(cm.rename(columns={'corr':'weight'}), 
                            source='name1', 
                            target='name2', 
                            edge_attr ='weight')

G.edges(data=True)
# EdgeDataView([('A', 'B', {'weight': 0.1}), ('A', 'C', {'weight': 0.2}), 
#               ('B', 'C', {'weight': 0.3})])

adj = nx.to_pandas_adjacency(G)
# sets the diagonal to 1 (node can't be connected to itself)
adj[:] = adj.values + np.eye(adj.shape[0])

print(adj)

    A    B    C
A  1.0  0.1  0.2
B  0.1  1.0  0.3
C  0.2  0.3  1.0

相关问题 更多 >