Python中的大数高斯核密度估计

from scipy import stats.gaussian_kde import matplotlib.pyplot as plt # 'data' is a 1D array that contains the initial numbers 37231 to 56661 xmin = min(data) xmax = max(data) # get evenly distributed numbers for X axis. x = linspace(xmin, xmax, 1000) # get 1000 points on x axis nPoints = len(x) # get actual kernel density. density = gaussian_kde(data) y = density(x) # print the output data for i in range(nPoints): print "%s %s" % (x[i], y[i]) plt.plot(x, density(x)) plt.show()

2条回答

网友

1楼 · 编辑于 2024-05-17 18:51:45

我做了一个函数来做这个。你可以改变带宽作为函数的参数。也就是说，较小的数字=更尖，较大的数字=更平滑。默认值为0.3。

它在IPython notebook --pylab=inline中工作

存储箱的数量经过优化和编码，因此会因数据中变量的数量而有所不同。

import scipy.stats as stats
import matplotlib.pyplot as plt
import numpy as np

def hist_with_kde(data, bandwidth = 0.3):
    #set number of bins using Freedman and Diaconis
    q1 = np.percentile(data,25)
    q3 = np.percentile(data,75)


    n = len(data)**(.1/.3)
    rng = max(data) - min(data)
    iqr = 2*(q3-q1)
    bins = int((n*rng)/iqr)

    x = np.linspace(min(data),max(data),200)

    kde = stats.gaussian_kde(data)
    kde.covariance_factor = lambda : bandwidth
    kde._compute_covariance()

    plt.plot(x,kde(x),'r') # distribution function
    plt.hist(data,bins=bins,normed=True) # histogram

data = np.random.randn(500)
hist_with_kde(data,0.25)

网友

2楼 · 编辑于 2024-05-17 18:51:45

我认为发生的情况是，您的数据数组是由整数组成的，这会导致以下问题：

>>> import numpy, scipy.stats
>>> 
>>> data = numpy.random.randint(37231, 56661,size=10)
>>> xmin, xmax = min(data), max(data)
>>> x = numpy.linspace(xmin, xmax, 10)
>>> 
>>> density = scipy.stats.gaussian_kde(data)
>>> density.dataset
array([[52605, 45451, 46029, 40379, 48885, 41262, 39248, 38247, 55987,
        44019]])
>>> density(x)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

但如果我们使用浮点数：

>>> density = scipy.stats.gaussian_kde(data*1.0)
>>> density.dataset
array([[ 52605.,  45451.,  46029.,  40379.,  48885.,  41262.,  39248.,
         38247.,  55987.,  44019.]])
>>> density(x)
array([  4.42201513e-05,   5.51130237e-05,   5.94470211e-05,
         5.78485526e-05,   5.21379448e-05,   4.43176188e-05,
         3.66725694e-05,   3.06297511e-05,   2.56191024e-05,
         2.01305127e-05])

相关问题更多 >

编程相关推荐

热门问题

热门文章