根据给定概率密度函数生成随机数

Question

我想在Python中指定一个分布的概率密度函数，然后从这个分布中随机选取N个数字。我该怎么做呢？

Answer 1

这是我用来根据给定的概率密度函数获取一个随机数的函数。我采用了一种类似蒙特卡洛的方法。当然，如果你需要生成n个随机数，可以调用这个函数n次。

    """
    Draws a random number from given probability density function.

    Parameters
    ----------
        pdf       -- the function pointer to a probability density function of form P = pdf(x)
        interval  -- the resulting random number is restricted to this interval
        pdfmax    -- the maximum of the probability density function
        integers  -- boolean, indicating if the result is desired as integer
        max_iterations -- maximum number of 'tries' to find a combination of random numbers (rand_x, rand_y) located below the function value calc_y = pdf(rand_x).

    returns a single random number according the pdf distribution.
    """
    def draw_random_number_from_pdf(pdf, interval, pdfmax = 1, integers = False, max_iterations = 10000):
        for i in range(max_iterations):
            if integers == True:
                rand_x = np.random.randint(interval[0], interval[1])
            else:
                rand_x = (interval[1] - interval[0]) * np.random.random(1) + interval[0] #(b - a) * random_sample() + a

            rand_y = pdfmax * np.random.random(1) 
            calc_y = pdf(rand_x)

            if(rand_y <= calc_y ):
                return rand_x

        raise Exception("Could not find a matching random number within pdf in " + max_iterations + " iterations.")

在我看来，如果你不需要获取非常多的随机变量，这个解决方案的表现比其他方案要好。另一个好处是，你只需要概率密度函数（PDF），而不需要计算累积分布函数（CDF）、反累积分布函数或权重。

Answer 2

一般来说，你需要的是反累积分布函数。一旦你有了这个，生成符合该分布的随机数就简单多了：

import random

def sample(n):
    return [ icdf(random.random()) for _ in range(n) ]

或者，如果你使用NumPy的话：

import numpy as np

def sample(n):
    return icdf(np.random.random(n))

在这两种情况下，icdf就是反累积分布函数，它接受一个介于0和1之间的值，然后输出对应的分布值。

为了说明icdf的特点，我们以一个简单的均匀分布为例，这个分布的值在10到12之间：

在10到12之间，概率分布函数是0.5，其他地方是0
累积分布函数在10以下是0（没有样本在10以下），在12以上是1（没有样本在12以上），在这两个值之间是线性增加的（这是概率分布函数的积分）
反累积分布函数只在0到1之间定义。在0时对应10，在12时对应1，并且在这两个值之间是线性变化的

当然，最难的部分是获取反累积密度函数。这实际上取决于你的分布，有时候你可能会有一个解析函数，有时候你可能需要用插值法。数值方法可能会很有用，因为数值积分可以用来创建累积分布函数，而插值法可以用来反转它。

根据给定概率密度函数生成随机数

2 个回答

撰写回答