如何解释分歧和低效样本警告?

2024-05-16 20:45:16 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在运行一个教程,教一些新手如何使用PyMC3进行回归。我以Ted演讲数据为例,试图找出评论数量、转录语言数量和演讲视频长度如何预测Ted演讲的受欢迎程度。我为PyMC3运行了以下代码:

    intercept = pm.Normal("Intercept", 5, sigma=3)
    beta_duration = pm.Normal('duration', mu = 0.05, sd = 0.3) 
    beta_languages = pm.Normal('languages', mu = 0.05, sd = 0.1) 
    beta_comments = pm.Normal('comments', mu = 0.05, sd = 0.1)
    epsilon = pm.HalfCauchy('epsilon', 5)

    likelihood = pm.Normal('likelihood', mu = intercept + beta_duration * ted_talk['duration'] + beta_languages * ted_talk['languages'] + beta_comments * ted_talk['comments'], sd = epsilon, observed = ted_talk['views'])
    trace = pm.sample(4000, tune = 2000, chains = 3)

结果:

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (3 chains in 1 job)
NUTS: [epsilon, comments, languages, duration, Intercept]

Sampling 3 chains for 2_000 tune and 4_000 draw iterations (6_000 + 12_000 draws total) took 91 seconds.
There were 973 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.5480812333460533, but should be close to 0.8. Try to increase the number of tuning steps.
There were 973 divergences after tuning. Increase `target_accept` or reparameterize.
There were 973 divergences after tuning. Increase `target_accept` or reparameterize.
The number of effective samples is smaller than 10% for some parameters.

问题1:MCMC模拟即使在调整后仍返回一些偏差的可能原因是什么?正如程序建议我增加target_accept和增加调优,您认为哪一个更有用?但如果这高度依赖,我想知道为什么会这样

问题2:如果有效样本太小,潜在的问题是什么?由于我没有看到任何“阈值”来确定有效样本的数量是否太大/太小(包括mcmc_diagnostic),您认为在贝叶斯回归模型中有多少有效样本是合理的

非常感谢您抽出时间!你的帮助太大了


Tags: target数量sdcommentsbetatalkdurationnormal