正态混合分布的峰度:实现中的错误
我一直在使用Fruhwirth-Shnatter (2006)这篇文章里的公式,来分析正态混合分布。我实现了第四阶矩的公式,但在把它转换成超峰度后,发现用蒙特卡洛抽样得到的结果和实际结果差得很远。我想知道我的代码或者公式哪里出错了。
这里有一段我代码的简化示例。
import numpy as np
from scipy.stats import kurtosis
get_p2 = lambda p1: 1 - p1
def simulate(p1, mu1, mu2, sigma1, sigma2, n):
param_draws = np.random.choice([0, 1], p = [p1, get_p2(p1)], size = n)
mu_param = np.where(param_draws == 0, mu1, mu2)
sigma_param = np.where(param_draws == 0, sigma1, sigma2)
return np.array([np.random.normal(loc = mu_param[i], scale = sigma_param[i]) for i in range(n)])
def mixture_mu(p1, mu1, mu2):
return p1 * mu1 + get_p2(p1) * mu2
def mixture_variance(p1, mu1, mu2, sigma1, sigma2):
mu = mixture_mu(p1, mu1, mu2)
part_1 = p1 * (np.power(sigma1, 2) + np.power(mu1, 2)) - np.power(mu, 2)
part_2 = get_p2(p1) * (np.power(sigma2, 2) + np.power(mu2, 2)) - np.power(mu, 2)
return part_1 + part_2
def excess_kurtosis(p1, mu1, mu2, sigma1, sigma2):
mu = mixture_mu(p1, mu1, mu2)
variance = mixture_variance(p1, mu1, mu2, sigma1, sigma2)
part_1 = np.power(mu1 - mu, 4) + (6 * np.power(mu1 - mu, 2) * np.power(sigma1, 2)) + 3 * np.power(sigma1, 4)
part_2 = np.power(mu2 - mu, 4) + (6 * np.power(mu2 - mu, 2) * np.power(sigma2, 2)) + 3 * np.power(sigma2, 4)
return (p1 * part_1 + get_p2(p1) * part_2) / np.power(variance, 2) - 3
p1 = 0.5; mu1 = 0; mu2 = 1; sigma1 = 2; sigma2 = 3
X = simulate(p1, mu1, mu2, sigma1, sigma2, n = 50_000)
analytic_excess_kurtosis = excess_kurtosis(p1, mu1, mu2, sigma1, sigma2)
empirical_excess_kurtosis = kurtosis(X)
print("Analytic:", analytic_excess_kurtosis)
print("Empirical:", empirical_excess_kurtosis)
当mu1和mu2的值差距增大时,实际结果和理论结果之间的偏差也变得更大。
这是怎么回事呢?是我代码里某个地方简单的错误,还是其他什么问题?提前谢谢大家。
0 个回答
暂无回答