在Python中计算皮尔逊相关系数及显著性

224 投票
16 回答
456640 浏览
提问于 2025-04-16 05:35

我在找一个函数,这个函数可以接收两个列表作为输入,然后返回它们之间的皮尔逊相关系数,还有这个相关性的重要性。

16 个回答

62

一个替代方案是使用一个叫做 SciPy 的库里的原生函数,具体是 linregress,这个函数可以计算以下内容:

斜率:回归线的斜率

截距:回归线的截距

相关系数:用来衡量两个变量之间关系强度的值

p值:用于假设检验的双侧p值,假设的原假设是斜率为零

标准误差:估计值的标准误差

下面是一个例子:

a = [15, 12, 8, 8, 7, 7, 7, 6, 5, 3]
b = [10, 25, 17, 11, 13, 17, 20, 13, 9, 15]
from scipy.stats import linregress
linregress(a, b)

这个会返回:

LinregressResult(slope=0.20833333333333337, intercept=13.375, rvalue=0.14499815458068521, pvalue=0.68940144811669501, stderr=0.50261704627083648)
120

皮尔逊相关系数可以通过 NumPycorrcoef 来计算。

import numpy
numpy.corrcoef(list1, list2)[0, 1]
218

你可以去看看这个链接:scipy.stats,这里面有很多关于统计的内容。

from pydoc import help
from scipy.stats.stats import pearsonr
help(pearsonr)

输出结果:

>>>
Help on function pearsonr in module scipy.stats.stats:

pearsonr(x, y)
 Calculates a Pearson correlation coefficient and the p-value for testing
 non-correlation.

 The Pearson correlation coefficient measures the linear relationship
 between two datasets. Strictly speaking, Pearson's correlation requires
 that each dataset be normally distributed. Like other correlation
 coefficients, this one varies between -1 and +1 with 0 implying no
 correlation. Correlations of -1 or +1 imply an exact linear
 relationship. Positive correlations imply that as x increases, so does
 y. Negative correlations imply that as x increases, y decreases.

 The p-value roughly indicates the probability of an uncorrelated system
 producing datasets that have a Pearson correlation at least as extreme
 as the one computed from these datasets. The p-values are not entirely
 reliable but are probably reasonable for datasets larger than 500 or so.

 Parameters
 ----------
 x : 1D array
 y : 1D array the same length as x

 Returns
 -------
 (Pearson's correlation coefficient,
  2-tailed p-value)

 References
 ----------
 http://www.statsoft.com/textbook/glosp.html#Pearson%20Correlation

撰写回答