scipy，日志正态分布 - 参数

39 投票

5 回答

35303 浏览

提问于 2025-04-17 09:37

我想用Python中的scipy.stats.lognormal.fit来给我的数据拟合一个对数正态分布。根据手册，fit函数会返回三个参数：shape（形状）、loc（位置）、scale（尺度）。但是，对数正态分布通常只需要两个参数：均值和标准差。

那么，如何理解从scipy的fit函数得到的结果呢？我该如何得到均值和标准差呢？

scipy 标准差参数估计数据拟合对数正态分布统计分布均值形状参数

5 个回答

我觉得这会对你有帮助。我也花了很长时间在找这个问题的解决办法，最后终于找到了解决方案。在我的情况下，我试图使用 scipy.stats.lognorm 模块将一些数据拟合到对数正态分布。但是，当我最终得到了模型参数后，却找不到用y数据的均值和标准差来复现我的结果的方法。

在下面的代码中，我解释了如何从均值和标准差参数生成一个正态分布的数据样本，使用的是 scipy.stats.norm 模块。利用这些数据，我拟合了正态模型（norm_dist_fitted），并且还创建了一个使用从数据中提取的均值和标准差（mu, sigma）的正态模型。

原始生成数据的模型、拟合的模型和通过（mu-sigma）对生成的模型在图表中进行了比较。

图1

在代码的下一部分，我使用正态数据生成一个对数正态分布的样本。要做到这一点，请注意，对数正态样本将是原始样本的指数。因此，指数样本的均值和标准差将是（exp(mu) 和 exp(sigma)）。

我将生成的数据拟合到一个 lognormal 模型（因为我的样本的对数（exp(x)）是正态分布的，并且符合对数正态模型的假设）。

要从原始数据（x）的均值和标准差生成一个对数正态模型，代码将是：

lognorm_dist = scipy.stats.lognorm(s=sigma, loc=0, scale=np.exp(mu))

但是，如果你的数据已经在指数空间（exp(x)）中，那么你需要使用：

muX = np.mean(np.log(x))
sigmaX = np.std(np.log(x))
scipy.stats.lognorm(s=sigmaX, loc=0, scale=muX)

图2

import scipy
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

mu = 10 # Mean of sample !!! Make sure your data is positive for the lognormal example 
sigma = 1.5 # Standard deviation of sample
N = 2000 # Number of samples

norm_dist = scipy.stats.norm(loc=mu, scale=sigma) # Create Random Process
x = norm_dist.rvs(size=N) # Generate samples

# Fit normal
fitting_params = scipy.stats.norm.fit(x)
norm_dist_fitted = scipy.stats.norm(*fitting_params)
t = np.linspace(np.min(x), np.max(x), 100)

# Plot normals
f, ax = plt.subplots(1, sharex='col', figsize=(10, 5))
sns.distplot(x, ax=ax, norm_hist=True, kde=False, label='Data X~N(mu={0:.1f}, sigma={1:.1f})'.format(mu, sigma))
ax.plot(t, norm_dist_fitted.pdf(t), lw=2, color='r',
        label='Fitted Model X~N(mu={0:.1f}, sigma={1:.1f})'.format(norm_dist_fitted.mean(), norm_dist_fitted.std()))
ax.plot(t, norm_dist.pdf(t), lw=2, color='g', ls=':',
        label='Original Model X~N(mu={0:.1f}, sigma={1:.1f})'.format(norm_dist.mean(), norm_dist.std()))
ax.legend(loc='lower right')
plt.show()


# The lognormal model fits to a variable whose log is normal
# We create our variable whose log is normal 'exponenciating' the previous variable

x_exp = np.exp(x)
mu_exp = np.exp(mu)
sigma_exp = np.exp(sigma)

fitting_params_lognormal = scipy.stats.lognorm.fit(x_exp, floc=0, scale=mu_exp)
lognorm_dist_fitted = scipy.stats.lognorm(*fitting_params_lognormal)
t = np.linspace(np.min(x_exp), np.max(x_exp), 100)

# Here is the magic I was looking for a long long time
lognorm_dist = scipy.stats.lognorm(s=sigma, loc=0, scale=np.exp(mu))

# The trick is to understand these two things:
# 1. If the EXP of a variable is NORMAL with MU and STD -> EXP(X) ~ scipy.stats.lognorm(s=sigma, loc=0, scale=np.exp(mu))
# 2. If your variable (x) HAS THE FORM of a LOGNORMAL, the model will be scipy.stats.lognorm(s=sigmaX, loc=0, scale=muX)
# with:
#    - muX = np.mean(np.log(x))
#    - sigmaX = np.std(np.log(x))


# Plot lognormals
f, ax = plt.subplots(1, sharex='col', figsize=(10, 5))
sns.distplot(x_exp, ax=ax, norm_hist=True, kde=False,
             label='Data exp(X)~N(mu={0:.1f}, sigma={1:.1f})\n X~LogNorm(mu={0:.1f}, sigma={1:.1f})'.format(mu, sigma))
ax.plot(t, lognorm_dist_fitted.pdf(t), lw=2, color='r',
        label='Fitted Model X~LogNorm(mu={0:.1f}, sigma={1:.1f})'.format(lognorm_dist_fitted.mean(), lognorm_dist_fitted.std()))
ax.plot(t, lognorm_dist.pdf(t), lw=2, color='g', ls=':',
        label='Original Model X~LogNorm(mu={0:.1f}, sigma={1:.1f})'.format(lognorm_dist.mean(), lognorm_dist.std()))
ax.legend(loc='lower right')
plt.show()

回答于 2025-04-17 由 Python大师

分享举报

我花了一些时间搞清楚这个问题，想在这里记录一下：如果你想从lognorm.fit的三个返回值中获取某个点x的概率密度（我们称这三个值为(shape, loc, scale)），你需要使用这个公式：

x = 1 / (shape*((x-loc)/scale)*sqrt(2*pi)) * exp(-1/2*(log((x-loc)/scale)/shape)**2) / scale

所以这个公式可以写成这样（loc是µ，shape是σ，scale是α）：

$x = \frac{1}{(x-\mu)\cdot\sqrt{2\pi\sigma^2}} \cdot e^{-\frac{log(\frac{x-\mu}{\alpha})^2}{2\sigma^2}}$

回答于 2025-04-17 由 Python大师

分享举报

在scipy这个库里，分布的设置是比较通用的，主要涉及两个参数：位置（loc）和尺度（scale）。位置参数就是用来把分布往左或往右移动的，而尺度参数则是用来压缩或拉伸分布的。

对于两个参数的对数正态分布来说，“均值”和“标准差”分别对应于log(scale)和shape（你可以把loc设为0）。

下面的内容展示了如何拟合一个对数正态分布，以找到我们关心的两个参数：

In [56]: import numpy as np

In [57]: from scipy import stats

In [58]: logsample = stats.norm.rvs(loc=10, scale=3, size=1000) # logsample ~ N(mu=10, sigma=3)

In [59]: sample = np.exp(logsample) # sample ~ lognormal(10, 3)

In [60]: shape, loc, scale = stats.lognorm.fit(sample, floc=0) # hold location to 0 while fitting

In [61]: shape, loc, scale
Out[61]: (2.9212650122639419, 0, 21318.029350592606)

In [62]: np.log(scale), shape  # mu, sigma
Out[62]: (9.9673084420467362, 2.9212650122639419)

回答于 2025-04-17 由 Python大师

分享举报

scipy，日志正态分布 - 参数

5 个回答

撰写回答