如何使用numpy计算"t检验"统计量

28 投票

3 回答

73949 浏览

提问于 2025-04-15 19:40

我想生成一些关于我在Python中创建的模型的统计数据。我想对它进行t检验，但不知道有没有简单的方法可以用numpy或scipy来实现。有没有好的解释可以参考一下？

比如，我有三个相关的数据集，它们看起来像这样：

[55.0, 55.0, 47.0, 47.0, 55.0, 55.0, 55.0, 63.0]

现在，我想对它们进行学生t检验。

numpy scipy 统计分析数据集 t-test

3 个回答

-4

一旦你得到了t值，你可能会想知道怎么把它理解成一个概率——我也是这样想的。这里有一个我写的函数，可以帮助你理解这个问题。

这个函数是基于我从http://www.vassarstats.net/rsig.html和http://en.wikipedia.org/wiki/Student%27s_t_distribution上获取的信息。

# Given (possibly random) variables, X and Y, and a correlation direction,
# returns:
#  (r, p),
# where r is the Pearson correlation coefficient, and p is the probability
# of getting the observed values if there is actually no correlation in the given
# direction.
#
# direction:
#  if positive, p is the probability of getting the observed result when there is no
#     positive correlation in the normally distributed full populations sampled by X
#     and Y
#  if negative, p is the probability of getting the observed result, when there is no
#     negative correlation
#  if 0, p is the probability of getting your result, if your hypothesis is true that
#    there is no correlation in either direction
def probabilityOfResult(X, Y, direction=0):
    x = len(X)
    if x != len(Y):
        raise ValueError("variables not same len: " + str(x) + ", and " + \
                         str(len(Y)))
    if x < 6:
        raise ValueError("must have at least 6 samples, but have " + str(x))
    (corr, prb_2_tail) = stats.pearsonr(X, Y)

    if not direction:
        return (corr, prb_2_tail)

    prb_1_tail = prb_2_tail / 2
    if corr * direction > 0:
        return (corr, prb_1_tail)

    return (corr, 1 - prb_1_tail)

回答于 2025-04-15 由 Python大师

分享举报

van的回答使用了scipy，确实是正确的，而且用scipy.stats.ttest_*这些函数非常方便。

不过我来这个页面是想找一个纯用numpy的解决方案，正如标题所说的那样，想避免依赖scipy。为此，我想指出这里给出的一个例子：https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.standard_t.html

主要的问题是，numpy没有累积分布函数，所以我的结论是你真的应该使用scipy。不过，单独使用numpy也是可以的：

从原始问题来看，我猜你是想比较你的数据集，并用t检验来判断是否有显著的偏差？另外，样本是成对的吗？（可以参考https://en.wikipedia.org/wiki/Student%27s_t-test#Unpaired_and_paired_two-sample_t-tests）

如果是这样的话，你可以这样计算t值和p值：

import numpy as np
sample1 = np.array([55.0, 55.0, 47.0, 47.0, 55.0, 55.0, 55.0, 63.0])
sample2 = np.array([54.0, 56.0, 48.0, 46.0, 56.0, 56.0, 55.0, 62.0])
# paired sample -> the difference has mean 0
difference = sample1 - sample2
# the t-value is easily computed with numpy
t = (np.mean(difference))/(difference.std(ddof=1)/np.sqrt(len(difference)))
# unfortunately, numpy does not have a build in CDF
# here is a ridiculous work-around integrating by sampling
s = np.random.standard_t(len(difference), size=100000)
p = np.sum(s<t) / float(len(s))
# using a two-sided test
print("There is a {} % probability that the paired samples stem from distributions with the same means.".format(2 * min(p, 1 - p) * 100))

这段代码会输出有73.028%的概率认为这对样本来自均值相同的分布。因为这个概率远高于任何合理的置信区间（比如5%），所以你不应该对具体情况得出任何结论。

回答于 2025-04-15 由 Python大师

分享举报

在 scipy.stats 这个包里，有几个以 ttest_... 开头的函数。你可以在这里找到相关的例子：

>>> print 't-statistic = %6.3f pvalue = %6.4f' %  stats.ttest_1samp(x, m)
t-statistic =  0.391 pvalue = 0.6955

回答于 2025-04-15 由 Python大师

分享举报

如何使用numpy计算"t检验"统计量

3 个回答

撰写回答