python中直方图的概率密度函数

import numpy as np """create random data points """ mu = 10 sigma = 5 n = 1000 datapoints = np.random.normal(mu,sigma,n) """ create normalized histrogram of the data """ bins = np.linspace(0,20,21) H, bins = np.histogram(data,bins,density=True)

1条回答

网友

1楼 · 发布于 2024-05-16 20:40:14

可以使用累积密度函数从任意分布生成随机数，如described here。

使用直方图生成平滑的累积密度函数并不简单；您可以使用插值，例如scipy.interpolate.interp1d（），用于存储箱中心之间的值，这对于存储箱和项目数量相当大的直方图来说效果很好。然而，你必须决定概率函数尾部的形式，即对于小于或大于最大bin的值。您可以基于将高斯拟合到直方图的例子给出分布高斯尾），或者根据您的问题给出任何其他形式的尾，或者简单地截断分布。

示例：

import numpy
import scipy.interpolate
import random
import matplotlib.pyplot as pyplot

# create some normally distributed values and make a histogram
a = numpy.random.normal(size=10000)
counts, bins = numpy.histogram(a, bins=100, density=True)
cum_counts = numpy.cumsum(counts)
bin_widths = (bins[1:] - bins[:-1])

# generate more values with same distribution
x = cum_counts*bin_widths
y = bins[1:]
inverse_density_function = scipy.interpolate.interp1d(x, y)
b = numpy.zeros(10000)
for i in range(len( b )):
    u = random.uniform( x[0], x[-1] )
    b[i] = inverse_density_function( u )

# plot both        
pyplot.hist(a, 100) 
pyplot.hist(b, 100)
pyplot.show()

这不处理尾部，它可以更好地处理垃圾箱边缘，但它可以让您开始使用直方图生成具有相同分布的更多值。

另外，你也可以试着用一些数值来拟合一个特定的已知分布（我认为这是你在问题中提到的），但是上面的非参数方法更通用。

相关问题更多 >

编程相关推荐

热门问题

热门文章