计算因素间相关性的正确方法

2条回答

网友

1楼 · 编辑于 2024-05-15 06:16:23

对于正交旋转（或根本不旋转），FactorAnalyzer不提供因子相关矩阵，因为它只是一个单位矩阵。对于倾斜旋转，可以使用phi_属性来获取因子相关矩阵

FactorAnalyzer包的工作方式与R的psych包相同。比如说,

library(tidyverse)
library(psych)

data(bfi)
bfi_subset = bfi %>% select(matches('^A[1-5]|^N'))
bfi_subset = bfi_subset %>% 
   mutate_all(~ifelse(is.na(.), median(., na.rm = TRUE), .))

model = fa(bfi_subset, 4, rotate = 'varimax')
print(model$Phi)
print(round(cor(model$scores), 3))

结果如下：

NULL
       MR2    MR1    MR3    MR4
MR2  1.000 -0.044 -0.004 -0.283
MR1 -0.044  1.000  0.297 -0.014
MR3 -0.004  0.297  1.000  0.116
MR4 -0.283 -0.014  0.116  1.000

考虑因素得分的相关性不一定会像你预期的那样起作用。比如说,

round(cor(model1$scores), 3)

产出如下：

    MR2 MR1 MR3 MR4
MR2 1.000   -0.044  -0.004  -0.283
MR1 -0.044  1.000   0.297   -0.014
MR3 -0.004  0.297   1.000   0.116
MR4 -0.283  -0.014  0.116   1.000

如果您想向自己证明这些因素确实是正交的（不相关的），您可以在FactorAnalyzer中执行以下操作：

fa = FactorAnalyzer(n_factors=4, rotation='varimax')
fa.fit(df_sub)
th = fa.rotation_matrix_
print(pd.DataFrame(th.T.dot(th)).round(2))

其输出如下：

     0    1    2    3
0  1.0  0.0 -0.0 -0.0
1  0.0  1.0  0.0 -0.0
2 -0.0  0.0  1.0  0.0
3 -0.0 -0.0  0.0  1.0

网友

2楼 · 编辑于 2024-05-15 06:16:23

这里需要补充一点：当您使用transform()方法时，您正在计算因子分数。有不同的方法可以做到这一点FactorAnalyzer实现了瑟斯通方法，但也有其他方法可以保留潜在的相关性，如“ten Berge”方法

如果这是您希望我们在FactorAnalyzer中实现的内容，请随时提出问题。下面是一个粗略的实现：

import warnings
import numpy as np
from sklearn.preprocessing import scale


def ten_berge(X, loadings, phi=None):
    """
    Estimate factor scores using the "ten Berge" method.

    Parameters
         
    X : array-like
        The data set
    loadings : array-like
        The loadings matrix

    Reference
         
    https://www.sciencedirect.com/science/article/pii/S0024379597100076
    """
    # get the number of factors from the loadings
    n_factors = loadings.shape[1]
    corr = np.corrcoef(X, rowvar=False)
    # if `phi` is None, create a diagonal matrix
    phi = np.diag(np.ones(n_factors)) if phi is None else phi
    # calculate intermediate metrics
    load = loadings.dot(matrix_sqrt(phi))
    corr_inv = inv_matrix_sqrt(corr)
    temp = corr_inv.dot(load)\
                   .dot(inv_matrix_sqrt(load.T.dot(np.linalg.inv(corr))
                                              .dot(load)))
    # calcualte weights
    weights = corr_inv.dot(temp)\
                      .dot(matrix_sqrt(phi))
    # calculate scores, given weights
    scores = scale(X).dot(weights)
    return scores


def matrix_sqrt(x):
    """
    Compute the square root of the eigen values (eVal),
    and then take $eVec * diag(eVals^0.5) * eVec^T$
    """
    evals, evecs = np.linalg.eig(x)
    evals[evals < 0] = np.finfo(float).eps
    sqrt_evals = np.sqrt(evals)
    return evecs.dot(np.diag(sqrt_evals)).dot(evecs.T)


def inv_matrix_sqrt(x):
    """
    Compute the inverse square root of the eigen values (eVal),
    and then take $eVec * diag(1 / eVals^0.5) * eVec^T$
    """
    evals, evecs = np.linalg.eig(x)
    if np.iscomplex(evals).any():
        warnings.warn('Complex eigen values detected; results are suspect.')
        return x
    evals[evals < np.finfo(float).eps] = 100 * np.finfo(float).eps
    inv_sqrt_evals =  1 / np.sqrt(evals)
    return evecs.dot(np.diag(inv_sqrt_evals)).dot(evecs.T)

df = pd.read_csv('https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/psych/bfi.csv')
df = df.filter(regex='^A[1-5]|^N').copy()
df = df.fillna(df.median(0))

fa = FactorAnalyzer(n_factors=5, rotation=None).fit(df)
pd.DataFrame(ten_berge(df, fa.loadings_)).corr().round(3)

相关问题更多 >

编程相关推荐

热门问题

热门文章

计算因素间相关性的正确方法

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >