python生成地形图(gtm)、gtm分类和gtm回归
ugtm的Python项目详细描述
教程
sklearn集成
ugtm v2.0提供sklearn兼容的gtm转换器(egtm)、gtm分类器(egtc)和gtm回归器(egtr):
from ugtm import eGTM, eGTC, eGTR import numpy as np # Dummy train and test X_train = np.random.randn(100, 50) X_test = np.random.randn(50, 50) y_train = np.random.choice([1, 2, 3], size=100) # GTM transformer transformed = eGTM().fit(X_train).transform(X_test) # Predict new labels using GTM classifier (GTC) predicted_labels = eGTC().fit(X_train, y_train).predict(X_test) # Predict new continuous outcomes using GTM regressor (GTR) predicted_labels = eGTR().fit(X_train, y_train).predict(X_test)
以下部分将显示sklearn框架中未定义的函数。
基本功能
ugtm提供了gtm(生成地形图)、kgtm(核心生成地形图)、gtm分类模型(knn、bayes)和gtm回归模型的实现。ugtm还实现了交叉验证选项,可用于比较gtm分类模型和支持向量机分类模型,以及gtm回归模型和支持向量机回归模型。典型用法:
#!/usr/bin/env python import ugtm import numpy as np #generate sample data and labels: replace this with your own data data=np.random.randn(100,50) labels=np.random.choice([1,2],size=100) #build GTM map gtm=ugtm.runGTM(data=data,verbose=True) #plot GTM map (html) gtm.plot_html(output="out")
有关安装说明,请参阅https://github.com/hagax8/ugtm
构建和绘制GTM地图(或KGTM地图)
gtm对象可以通过在数据集上运行rungtm函数来创建。rungtm的参数为:k=sqrt(节点数),m=sqrt(rbf中心数),s=rbf宽度因子,regul=正则化系数。默认情况下,期望最大化算法的迭代次数设置为200。这是一个随机数据示例:
import ugtm #import numpy to generate random data import numpy as np #generate random data (independent variables x), #discrete labels (dependent variable y), #and continuous labels (dependent variable y), #to experiment with categorical or continuous outcomes train = np.random.randn(20,10) test = np.random.randn(20,10) labels=np.random.choice(["class1","class2"],size=20) activity=np.random.randn(20,1) #create a gtm object and write model gtm = ugtm.runGTM(train) gtm.write("testout1") #run verbose gtm = ugtm.runGTM(train, verbose=True) #to run a kernel GTM model instead, run following: gtm = ugtm.runkGTM(train, doKernel=True, kernel="linear") #access coordinates (means or modes), and responsibilities of gtm object gtm_coordinates = gtm.matMeans gtm_modes = gtm.matModes gtm_responsibilities = gtm.matR
绘制HTML地图
在gtm对象上调用plot_html()函数:
#run model on train gtm = ugtm.runGTM(train) # ex. plot gtm object with landscape, html: labels are continuous gtm.plot_html(output="testout10",labels=activity,discrete=False,pointsize=20) # ex. plot gtm object with landscape, html: labels are discrete gtm.plot_html(output="testout11",labels=labels,discrete=True,pointsize=20) # ex. plot gtm object with landscape, html: labels are continuous # no interpolation between nodes gtm.plot_html(output="testout12",labels=activity,discrete=False,pointsize=20, \ do_interpolate=False,ids=labels) # ex. plot gtm object with landscape, html: labels are discrete, # no interpolation between nodes gtm.plot_html(output="testout13",labels=labels,discrete=True,pointsize=20, \ do_interpolate=False)
绘制pdf地图
对gtm对象调用plot()函数:
#run model on train gtm = ugtm.runGTM(train) # ex. plot gtm object, pdf: no labels gtm.plot(output="testout6",pointsize=20) # ex. plot gtm object with landscape, pdf: labels are discrete gtm.plot(output="testout7",labels=labels,discrete=True,pointsize=20) # ex. plot gtm object with landscape, pdf: labels are continuous gtm.plot(output="testout8",labels=activity,discrete=False,pointsize=20)
绘制多面板视图
对gtm对象调用plot_multipanel()函数。 这将绘制一个通用模型视图,显示有或无点的方式、模式、景观。 plot_multipanel函数仅在定义了标签时才起作用:
#run model on train gtm = ugtm.runGTM(train) # ex. with discrete labels and inter-node interpolation gtm.plot_multipanel(output="testout2",labels=labels,discrete=True,pointsize=20) # ex. with continuous labels and inter-node interpolation gtm.plot_multipanel(output="testout3",labels=activity,discrete=False,pointsize=20) # ex. with discrete labels and no inter-node interpolation gtm.plot_multipanel(output="testout4",labels=labels,discrete=True,pointsize=20, \ do_interpolate=False) # ex. with continuous labels and no inter-node interpolation gtm.plot_multipanel(output="testout5",labels=activity,discrete=False,pointsize=20, \ do_interpolate=False)
使用transform()函数可以将新数据投影到gtm映射上,该函数将gtm模型、训练集和测试集作为输入。然后,列车组仅用于对基于列车的测试集执行数据预处理(例如:在运行算法之前,对列车和测试集应用相同的PCA转换):
#run model on train gtm = ugtm.runGTM(train,doPCA=True) #test new data (test) transformed=ugtm.transform(optimizedModel=gtm,train=train,test=test,doPCA=True) #plot transformed test (html) transformed.plot_html(output="testout14",pointsize=20) #plot transformed test (pdf) transformed.plot(output="testout15",pointsize=20) #plot transformed data on existing classification model, #using training set labels gtm.plot_html_projection(output="testout16",projections=transformed,\ labels=labels, \ discrete=True,pointsize=20)
7号。测试集的输出预测:gtm回归(gtr)和分类(gtc)
gtr()函数实现gtm回归模型(参考文献)和gtc()函数实现gtm分类模型(参考文献):
#continuous labels (prediction by GTM regression model) predicted=ugtm.GTR(train=train,test=test,labels=activity) #discrete labels (prediction by GTM classification model) predicted=ugtm.GTC(train=train,test=test,labels=labels)
8个。具有每类概率的高级GTM预测
测试集的每类概率可以由advancedGTC()函数给出(您可以像使用rungtm一样设置m、k、regul、s参数):
#get whole output model and label predictions for test set predicted_model=ugtm.advancedGTC(train=train,test=test,labels=labels) #write whole predicted model with per-class probabilities ugtm.printClassPredictions(predicted_model,"testout17")
9号。交叉验证实验
通过不同的交叉验证实验,将gtc和gtr模型与经典的机器学习方法进行了比较:
#crossvalidation experiment: GTM classification model implemented in ugtm, #here: set hyperparameters s=1 and regul=1 (set to -1 to optimize) ugtm.crossvalidateGTC(data=train,labels=labels,s=1,regul=1,n_repetitions=10,n_folds=5) #crossvalidation experiment: GTM regression model ugtm.crossvalidateGTR(data=train,labels=activity,s=1,regul=1) #you can also run the following functions to compare #with other classification/regression algorithms: #crossvalidation experiment, k-nearest neighbours classification #on 2D PCA map with 7 neighbors (set to -1 to optimize number of neighbours) ugtm.crossvalidatePCAC(data=train,labels=labels,n_neighbors=7) #crossvalidation experiment, SVC rbf classification model (sklearn implementation): ugtm.crossvalidateSVCrbf(data=train,labels=labels,C=1,gamma=1) #crossvalidation experiment, linear SVC classification model (sklearn implementation): ugtm.crossvalidateSVC(data=train,labels=labels,C=1) #crossvalidation experiment, linear SVC regression model (sklearn implementation): ugtm.crossvalidateSVR(data=train,labels=activity,C=1,epsilon=1) #crossvalidation experiment, k-nearest neighbours regression on 2D PCA map with 7 neighbors: ugtm.crossvalidatePCAR(data=train,labels=activity,n_neighbors=7)
10个。链接和参考
- bishop等人的gtm算法:https://www.microsoft.com/en-us/research/wp-content/uploads/1998/01/bishop-gtm-ncomp-98.pdf
- 内核gtm:https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2010-44.pdf
- GTM分类模型:https://www.ncbi.nlm.nih.gov/pubmed/24320683
- gtm回归模型:https://www.ncbi.nlm.nih.gov/pubmed/27490381
- github:https://github.com/hagax8/ugtm