Pandas二元变量间的相关性

def cramers_corrected_stat(confusion_matrix): chi2 = ss.chi2_contingency(confusion_matrix)[0] n = confusion_matrix.sum() phi2 = chi2/n r,k = confusion_matrix.shape phi2corr = max(0, phi2 - ((k-1)*(r-1))/(n-1)) rcorr = r - ((r-1)**2)/(n-1) kcorr = k - ((k-1)**2)/(n-1) return np.sqrt(phi2corr / min( (kcorr-1), (rcorr-1)))

CL UP NS P CL_S 480 1 0 1 0 1 1232 1 0 1 0 1 2308 1 1 1 0 1 1590 1 0 1 0 1 497 1 1 0 0 1 ... ... ... ... ... ... 1066 1 1 1 0 1 1817 1 0 1 0 1 2411 1 1 1 0 1 2149 1 0 1 0 1 1780 1 0 1 0 1

1条回答

网友

1楼 · 发布于 2024-06-16 11:49:50

您创建的函数不适合您的数据集

因此，使用下面给出的follow函数cramers_V(var1,var2)

from scipy.stats import chi2_contingency
def cramers_V(var1,var2):
  crosstab =np.array(pd.crosstab(var1,var2, rownames=None, colnames=None)) # Cross table building
  stat = chi2_contingency(crosstab)[0] # Keeping of the test statistic of the Chi2 test
  obs = np.sum(crosstab) # Number of observations
  mini = min(crosstab.shape)-1 # Take the minimum value between the columns and the rows of the cross table
  return (stat/(obs*mini))

使用该函数的示例代码如下所示

cramers_V(df["CL"], df["NS"])

如果要计算数据集的所有可能对，请使用以下代码

import itertools
for col1, col2 in itertools.combinations(df.columns, 2):
    print(col1, col2, cramers_V(df[col1], df[col2]))

相关问题更多 >

编程相关推荐

热门问题

热门文章