使用numpy生成离散概率分布

Question

我在看一个代码示例，地址是http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html#subclassing-rv-discrete，这个示例是用来实现一个生成离散值的随机数生成器，基于正态分布。这个示例（也不意外）运行得很好，但当我修改它，只允许生成左尾或右尾的结果时，发现0附近的分布太低了（0这个区间应该包含更多的值）。我可能碰到了某个边界条件，但我搞不清楚。是不是我漏掉了什么？

这是每个区间内随机数的计数结果：

np.bincount(rvs) [1082 2069 1833 1533 1199  837  644  376  218  111   55   20   12    7    2 2]

这是生成的直方图：

enter image description here

from scipy import stats

np.random.seed(42)

def draw_discrete_gaussian(rng, tail='both'):
    # number of integer support points of the distribution minus 1
    npoints = rng if tail == 'both' else rng * 2
    npointsh = npoints / 2
    npointsf = float(npoints)
    # bounds for the truncated normal
    nbound = 4
    # actual bounds of truncated normal
    normbound = (1+1/npointsf) * nbound
    # integer grid
    grid = np.arange(-npointsh, npointsh+2, 1)
    # bin limits for the truncnorm
    gridlimitsnorm = (grid-0.5) / npointsh * nbound
    # used later in the analysis
    gridlimits = grid - 0.5
    grid = grid[:-1]
    probs = np.diff(stats.truncnorm.cdf(gridlimitsnorm, -normbound, normbound))
    gridint = grid

    normdiscrete = stats.rv_discrete(values=(gridint, np.round(probs, decimals=7)), name='normdiscrete')
    # print 'mean = %6.4f, variance = %6.4f, skew = %6.4f, kurtosis = %6.4f'% normdiscrete.stats(moments =  'mvsk')
    rnd_val = normdiscrete.rvs()
    if tail == 'both':
        return rnd_val
    if tail == 'left':
        return -abs(rnd_val)
    elif tail == 'right':
        return abs(rnd_val)


rng = 15
tail = 'right'
rvs = [draw_discrete_gaussian(rng, tail=tail) for i in xrange(10000)]

if tail == 'both':
    rng_min = rng / -2.0
    rng_max = rng / 2.0
elif tail == 'left':
    rng_min = -rng
    rng_max = 0
elif tail == 'right':
    rng_min = 0
    rng_max = rng

gridlimits = np.arange(rng_min-.5, rng_max+1.5, 1)
print gridlimits
f, l = np.histogram(rvs, bins=gridlimits)

# cheap way of creating histogram
import matplotlib.pyplot as plt
%matplotlib inline

bins, edges = f, l
left,right = edges[:-1],edges[1:]
X = np.array([left, right]).T.flatten()
Y = np.array([bins, bins]).T.flatten()

# print 'rvs', rvs
print 'np.bincount(rvs)', np.bincount(rvs)

plt.plot(X,Y)
plt.show()

数据可视化直方图随机数生成数值计算正态分布边界条件概率统计离散概率分布

使用numpy生成离散概率分布

1 个回答

撰写回答