如何在Python中计算总统计距离

2024-04-27 18:36:52 发布

您现在位置:Python中文网/ 问答频道 /正文

给出了两个概率分布之间的总变差距离。在

我试着用python计算它。我有两个数据集,首先从直方图计算它们的概率分布函数。然后我试图得到两个分布的最大差值。但它返回的值很小。看来我做错事了。你能帮我修一下吗?在

import scipy.stats as st
#original data has shape of [45222,1] and it is numpy array
#synthetic data has shape of [45222,1] and it is numpy array
summation = 0
minOriginal = min(original)
minGenerated = min(synthetic)

maxOriginal = max(original)
maxGenerated = max(synthetic)

minHist = min(minOriginal, minGenerated)
maxHist = max(maxOriginal, maxGenerated)

originalHist = np.histogram(original, range=(minHist, maxHist))
hist_dist1 = st.rv_histogram(originalHist)

generatedHist = np.histogram(synthetic, range=(minHist, maxHist))
hist_dist2 = st.rv_histogram(generatedHist)

x = np.linspace(minHist, maxHist, 45000)
summation += max(abs(hist_dist1.pdf(x)-hist_dist2.pdf(x)))

Tags: ofdatanpminhistmaxhistogramhas