我有两个不同概率分布函数(pdf)的列表,p(x)和p(y)。我知道它们之间存在相关性,并希望生成联合分布p(x,y),以便计算它们的互信息。你知道吗
我做过研究,发现了统计中的copula理论,显然,这就是解决我问题的方法。然而,即使做了copula,我也不知道如何生成p(x,y)。我试过“copulalib”、“copula”等软件包,到目前为止,我能达到的最好效果是使用“ambhas”(不过,我不得不在课堂上修改一些东西)。这是我的密码:
import scipy as sp
import scipy.interpolate
import numpy as np
from matplotlib import pyplot as plt
from matplotlib import rcParams
plt.rcParams.update({'font.size': 10})
rcParams.update({'figure.autolayout': True})
from ambhas.errlib import rmse, correlation
from ambhas.copula import Copula
import seaborn as sns
def log_interp1d(xx, yy, kind='linear'):
logx = np.log10(xx)
logy = np.log10(yy)
lin_interp = sp.interpolate.interp1d(logx, logy, kind=kind)
log_interp = lambda zz: np.power(10.0, lin_interp(np.log10(zz)))
return log_interp
def derivada(f,x):
dx = x[1] - x[0]
flinha = []
for i in range(1,len(f)):
flinha.append((f[i]-f[i-1])/dx)
return flinha
#interpolating logarithmic data and generating the cumulated distribution
xx = [1e-10, 0.00014, 0.00042, 0.0014, 0.0042, 0.014,0.07]
yy = [1e-20, 0.125, 0.275, 0.4711, 0.775, 0.875,1]
f = log_interp1d(xx,yy)
xnew = np.linspace(xx[0], xx[-1], num=10000, endpoint=True)
fda_cerc = f(xnew) #cdf
x_cerc = xnew
xx = [1e-10, 2.1e-6,2.1e-5,2.1e-4,2.1e-3, 0.007, 0.021, 0.049, 0.07]
yy = [1e-20, 0.0583, 0.1, 0.1083, 0.5083, 0.7583, 0.9917, 0.9917, 1]
f = log_interp1d(xx,yy)
fda_imida = f(xnew) #cdf
x_imida = xnew
#generatin p(x) and p(y)
pdf_imida = np.array(derivada(fda_imida,x_imida)) #p(x)
pdf_cerconil = np.array(derivada(fda_cerc,x_cerc)) #p(y)
c = correlation(pdf_imida,pdf_cerconil)
foo = Copula(pdf_imida, pdf_cerconil, 'frank')
u,v = foo.generate_uv(9999)
plt.plot(v)
h = sns.jointplot(u,v,kind = 'kde')
plt.savefig('copulas.jpg', dpi = 400)
生成的图与copula理论是一致的,但是我能做些什么来生成p(x,y)?有简单的方法吗?你知道吗
目前没有回答
相关问题 更多 >
编程相关推荐