Python中用于非常大数字的二项检验

9 投票

6 回答

11307 浏览

提问于 2025-04-16 00:04

我需要在Python中做一个二项式检验，能够处理大约10000个的'n'值。

我用scipy.misc.comb实现了一个快速的binomial_test函数，但它的限制在n=1000左右。我猜是因为在计算阶乘或组合时，达到了可以表示的最大数字。下面是我的函数：

from scipy.misc import comb
def binomial_test(n, k):
    """Calculate binomial probability
    """
    p = comb(n, k) * 0.5**k * 0.5**(n-k)
    return p

我该如何使用Python本身的函数（或者numpy、scipy等）来计算这个二项式概率呢？如果可以的话，我需要兼容scipy 0.7.2的代码。

非常感谢！

组合数学数值稳定性 scipy库阶乘计算二项检验大数字计算概率统计

6 个回答

GMPY 还支持扩展精度的浮点数计算。举个例子：

>>> from gmpy import *
>>>
>>> def f(n,k,p,prec=256):
...     return mpf(comb(n,k),prec) * mpf(p,prec)**k * mpf(1-p,prec)**(n-k)
...
>>> print(f(1000,500,0.5))
0.0252250181783608019068416887621024545529410193921696384762532089115753731615931
>>>

我设置了浮点数的精度为256位。顺便提一下，source forge上的版本已经很旧了。现在的版本在code.google.com上维护，并且支持Python 3.x。（声明：我是gmpy的现任维护者。）

casevh

回答于 2025-04-16 由 Python大师

分享举报

任何看起来像 comb(n, k) * 0.5**k * 0.5**(n-k) 的解决方案在处理大数字 n 时都不太靠谱。在大多数（可能是所有）平台上，Python 中浮点数能存储的最小值大约是 2**-1022。当 n-k 或 k 的值很大时，右边的结果会被四舍五入到 0。同样，comb(n, k) 的值可能会大到无法放入浮点数中。

一个更稳妥的方法是计算概率密度函数，这个函数是通过计算累积分布函数中两个连续点之间的差值得到的。这个可以用正则化的不完全贝塔函数来计算（可以在 SciPy 的“特殊函数”包中找到）。数学上可以表示为：

pdf(p, n, k) = cdf(p, n, k) - cdf(p, n, k-1)

另一种选择是使用正态近似，对于大数字 n 来说，这种方法相当准确。如果你关心速度，这可能是更好的选择：

from math import *

def normal_pdf(x, m, v):
    return 1.0/sqrt(2*pi*v) * exp(-(x-m)**2/(2*v))

def binomial_pdf(p, n, k):
    if n < 100:
        return comb(n, k) * p**k * p**(n-k)  # Fall back to your current method
    return normal_pdf(k, n*p, n*p*(1.0-p))

我没有测试过代码，但这应该能给你一个大致的思路。

回答于 2025-04-16 由 Python大师

分享举报

编辑补充一下这个评论：请注意，正如Daniel Stutzbach提到的，“二项检验”可能不是原提问者想要的内容（尽管他确实使用了这个表达）。他似乎是在询问二项分布的概率密度函数，而这并不是我下面要提到的内容。

你试过使用scipy.stats.binom_test吗？

rbp@apfelstrudel ~$ python
Python 2.6.2 (r262:71600, Apr 16 2009, 09:17:39) 
[GCC 4.0.1 (Apple Computer, Inc. build 5250)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from scipy import stats
>>> print stats.binom_test.__doc__

    Perform a test that the probability of success is p.

    This is an exact, two-sided test of the null hypothesis
    that the probability of success in a Bernoulli experiment
    is `p`.

    Parameters
    ----------
    x : integer or array_like
        the number of successes, or if x has length 2, it is the
        number of successes and the number of failures.
    n : integer
        the number of trials.  This is ignored if x gives both the
        number of successes and failures
    p : float, optional
        The hypothesized probability of success.  0 <= p <= 1. The
        default value is p = 0.5

    Returns
    -------
    p-value : float
        The p-value of the hypothesis test

    References
    ----------
    .. [1] http://en.wikipedia.org/wiki/Binomial_test


>>> stats.binom_test(500, 10000)
4.9406564584124654e-324

小补充一下，添加文档链接：http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binom_test.html#scipy.stats.binom_test

顺便说一下：在scipy 0.7.2版本和当前的0.8开发版上都可以使用。

回答于 2025-04-16 由 Python大师

分享举报

Python中用于非常大数字的二项检验

6 个回答

撰写回答