利用Kevin Murphy贝叶斯网络工具箱学习稀疏数据

2024-06-07 14:43:50 发布

您现在位置：Python中文网/ 问答频道 /正文

7912

网友

男 | 程序猿一只，喜欢编程写python代码。

我在使用用Kevin Murphy's Bayesian Network Toolbox（以下简称为BNT）训练的贝叶斯网络的学习参数时遇到了一个问题

最后，我希望将经过训练的参数导出到Python中的另一个包，例如pgmpy。当我这样做的时候，我注意到局部条件属性的和不是一。事实上，如果在不满足此条件的情况下尝试使用贝叶斯网络进行预测，pgmpy将抛出错误

我认为这是由于在稀疏数据上有BNT学习参数造成的，其中一些变量的估值实际上没有出现在数据中。最初，我遇到了这个问题，使用期望最大化和连接树算法来训练一个具有隐藏变量的复杂网络。然而，我可以用一个简单的马尔可夫链和最大似然估计（MLE）重现同样的问题。在本例中，我们定义了一个网络来对一些测试数据进行采样。然后，我们创建该网络的新版本，其中包含一个新的“虚拟”状态，该状态不会出现在测试数据中

O=1:2; % We have 2 observable nodes O(1) and O(2)

node_sizes=repelem(2, 2);
%   initially both variables have just two states, but this will change later
num_nodes=length(node_sizes);
onodes=O;

dag=false(num_nodes, num_nodes);
dag(1,2)=true; % O(1) -> O(2)

ix2var = string({'o1','o2'});

seed=42;
rand('state', seed);
randn('state', seed);

% the initial network that will create some test data
bnet = mk_bnet(dag, node_sizes, 'names', cellstr(ix2var), 'observed', onodes);

% set some arbitrary probabilities to get some test data
bnet.CPD{O(1)} = tabular_CPD(bnet, O(1), 'CPT', [0.2, 0.8]);
bnet.CPD{O(2)} = tabular_CPD(bnet, O(2), 'CPT', [0.3, 0.7, 0.6, 0.4]);

N = length(onodes);
nsamples = 200;
samples = cell(N, nsamples);
for i=1:nsamples
  samples(:,i) = sample_bnet(bnet);
end

node_sizes=repelem(3, 2); % introduce the imaginary state into our observable variables

% redefined the network with the new imaginary states
bnet = mk_bnet(dag, node_sizes, 'names', cellstr(ix2var), 'observed', onodes);

% initialise the CPTs to uniform distributions prior to training
bnet.CPD{O(1)} = tabular_CPD(bnet, O(1));
bnet.CPD{O(2)} = tabular_CPD(bnet, O(2));

% learn parameters using MLE
bnet = learn_params(bnet, samples);

% copy CPT parameters to cpts
cpts=cell(num_nodes);
for i = 1:num_nodes
    cpts{i} = struct(bnet.CPD{i}).CPT;
end

% now check Sigma(i=1..3)(P(O(2)_i)| O(1)=3) == 1 
%sum(cpts{O(2)}(3,:))
% ans =
% 
%      0
% Oh no! It's zero!

因此，我的问题是：

BNT工作正常吗？我不是要在稀疏数据上训练它吗
如果我想将这些参数加载到另一个包中，我是否可以将这些“空”分布设置为均匀分布，即所有系数1/n，其中n是变量可以采取的状态数

PS我发现其中一个问题是，如果学习是以小批量方式进行的，那么学习过程似乎会完全忘记一些局部条件概率，仅仅因为数据集中没有出现特定的值

Tags： the to 数据网络 node 参数 num nodes

0条回答

目前没有回答

利用Kevin Murphy贝叶斯网络工具箱学习稀疏数据

相关问题更多 >

编程相关推荐

热门问题

热门文章

利用Kevin Murphy贝叶斯网络工具箱学习稀疏数据

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >