来自ttest的PV值的意外分布

from scipy import stats import numpy as np ps = [] for i in range(5000): gaussian_numbers = np.random.normal(0, 1, size=100000) gaussian_numbers2 = np.random.normal(0, 1, size=100000) t, p = stats.ttest_ind(gaussian_numbers, gaussian_numbers2, equal_var=True) ps.append(p) plt.hist(ps, 100)

2条回答

网友

1楼 · 编辑于 2024-04-29 04:24:23

So i'd expect, that the t-test results in relatively high p-values, or a tendency to high p-values.

你的期望是不正确的。您的输入满足t检验的“零假设”：它们来自具有相同平均值的总体。通常，当执行假设检验（如t检验）且输入满足零假设时，distribution of the p-value在区间[0,1]上是一致的。因此，您的绘图是重复测试的预期结果

网友

2楼 · 编辑于 2024-04-29 04:24:23

从同一分布中随机抽取两个样本，计算t统计量，以检验均值相同的无效假设

因为样本是随机的，所以没有理由将p值分布得更接近1。要理解这一点，请考虑置信区间

置信区间告诉您（1-alpha）*100%的时间，真实参数将位于观察到的区间内。同样，您的p值在0和0.05之间，大约占时间的5%

换言之：

# Convert `ps` to numpy array
ps = np.array(ps)
# Check how many times you rejected H0
print('We rejected H0', (ps <= 0.05).sum(), 'times out of', len(ps))
print('We did not reject H0', (ps > 0.05).sum(), 'times out of', len(ps))

We rejected H0 246 times out of 5000
We did not reject H0 4754 times out of 5000

相关问题更多 >

编程相关推荐

热门问题

热门文章