使用时p_值为0scipy.stats.kstest（）对于大数据

[In]: frequencies = mp[['c','v']] [In]: print frequencies c v 31 3475.8 18.0 30 3475.6 12.0 29 3475.4 13.0 28 3475.2 8.0 20 3475.0 49.0 14 3474.8 69.0 13 3474.6 79.0 12 3474.4 78.0 11 3474.2 78.0 7 3474.0 151.0 6 3473.8 157.0 5 3473.6 129.0 2 3473.4 149.0 1 3473.2 162.0 0 3473.0 179.0 3 3472.8 145.0 4 3472.6 139.0 8 3472.4 95.0 9 3472.2 103.0 10 3472.0 125.0 15 3471.8 56.0 16 3471.6 75.0 17 3471.4 70.0 18 3471.2 70.0 19 3471.0 57.0 21 3470.8 36.0 22 3470.6 22.0 23 3470.4 20.0 24 3470.2 12.0 25 3470.0 23.0 26 3469.8 13.0 27 3469.6 17.0 32 3469.4 6.0 [In]: testData = map(lambda x: np.repeat(x[0], int(x[1])), frequencies.values) [In]: testData = list(itertools.chain.from_iterable(testData)) [In]: print len(testData) 2415 [In]: print np.unique(testData) [ 3469.4 3469.6 3469.8 3470. 3470.2 3470.4 3470.6 3470.8 3471. 3471.2 3471.4 3471.6 3471.8 3472. 3472.2 3472.4 3472.6 3472.8 3473. 3473.2 3473.4 3473.6 3473.8 3474. 3474.2 3474.4 3474.6 3474.8 3475. 3475.2 3475.4 3475.6 3475.8] [In]: scs.kstest(testData, 'norm') KstestResult(statistic=1.0, pvalue=0.0)

1条回答

网友

1楼 · 发布于 2024-05-12 19:33:05

使用'norm'作为输入将检查数据的分布是否与默认参数scipy.stats.norm.cdf相同：loc=0, scale=1。在

相反，您需要将正态分布拟合到您的数据中，然后使用Kolmogorov–Smirnov检验检查数据和分布是否相同。在

import numpy as np
from scipy.stats import norm, kstest
import matplotlib.pyplot as plt

freqs = [[3475.8, 18.0], [3475.6, 12.0], [3475.4, 13.0], [3475.2, 8.0], [3475.0, 49.0],
    [3474.8, 69.0], [3474.6, 79.0], [3474.4, 78.0], [3474.2, 78.0], [3474.0, 151.0],
    [3473.8, 157.0], [3473.6, 129.0], [3473.4, 149.0], [3473.2, 162.0], [3473.0, 179.0],
    [3472.8, 145.0], [3472.6, 139.0], [3472.4, 95.0], [3472.2, 103.0], [3472.0, 125.0],
    [3471.8, 56.0], [3471.6, 75.0], [3471.4, 70.0], [3471.2, 70.0], [3471.0, 57.0],
    [3470.8, 36.0], [3470.6, 22.0], [3470.4, 20.0], [3470.2, 12.0], [3470.0, 23.0],
    [3469.8, 13.0], [3469.6, 17.0], [3469.4, 6.0]]

data = np.hstack([np.repeat(x,int(f)) for x,f in freqs])
loc, scale = norm.fit(data)
# create a normal distribution with loc and scale
n = norm(loc=loc, scale=scale)

绘制标准与数据的拟合：

^{pr2}$

这不是一个非常好的适合，大部分是由于长尾在左边。但是，现在可以使用拟合正态分布的cdf来运行一个正确的Kolmogorov–Smirnov测试

kstest(data, n.cdf)
# returns:
KstestResult(statistic=0.071276854859734784, pvalue=4.0967451653273201e-11)

因此，我们仍然在否定产生数据与拟合分布相同的分布的零假设。在

相关问题更多 >

编程相关推荐

热门问题

热门文章