python生成地形图(gtm)、gtm分类和gtm回归

ugtm的Python项目详细描述


教程

sklearn集成

ugtm v2.0提供sklearn兼容的gtm转换器(egtm)、gtm分类器(egtc)和gtm回归器(egtr):

from ugtm import eGTM, eGTC, eGTR
import numpy as np

# Dummy train and test
X_train = np.random.randn(100, 50)
X_test = np.random.randn(50, 50)
y_train = np.random.choice([1, 2, 3], size=100)

# GTM transformer
transformed = eGTM().fit(X_train).transform(X_test)

# Predict new labels using GTM classifier (GTC)
predicted_labels = eGTC().fit(X_train, y_train).predict(X_test)

# Predict new continuous outcomes using GTM regressor (GTR)
predicted_labels = eGTR().fit(X_train, y_train).predict(X_test)

以下部分将显示sklearn框架中未定义的函数。

基本功能

ugtm提供了gtm(生成地形图)、kgtm(核心生成地形图)、gtm分类模型(knn、bayes)和gtm回归模型的实现。ugtm还实现了交叉验证选项,可用于比较gtm分类模型和支持向量机分类模型,以及gtm回归模型和支持向量机回归模型。典型用法:

#!/usr/bin/env python

import ugtm
import numpy as np

#generate sample data and labels: replace this with your own data
data=np.random.randn(100,50)
labels=np.random.choice([1,2],size=100)

#build GTM map
gtm=ugtm.runGTM(data=data,verbose=True)

#plot GTM map (html)
gtm.plot_html(output="out")

有关安装说明,请参阅https://github.com/hagax8/ugtm

构建和绘制GTM地图(或KGTM地图)

gtm对象可以通过在数据集上运行rungtm函数来创建。rungtm的参数为:k=sqrt(节点数),m=sqrt(rbf中心数),s=rbf宽度因子,regul=正则化系数。默认情况下,期望最大化算法的迭代次数设置为200。这是一个随机数据示例:

import ugtm

#import numpy to generate random data
import numpy as np

#generate random data (independent variables x),
#discrete labels (dependent variable y),
#and continuous labels (dependent variable y),
#to experiment with categorical or continuous outcomes

train = np.random.randn(20,10)
test = np.random.randn(20,10)
labels=np.random.choice(["class1","class2"],size=20)
activity=np.random.randn(20,1)

#create a gtm object and write model
gtm = ugtm.runGTM(train)
gtm.write("testout1")

#run verbose
gtm = ugtm.runGTM(train, verbose=True)

#to run a kernel GTM model instead, run following:
gtm = ugtm.runkGTM(train, doKernel=True, kernel="linear")

#access coordinates (means or modes), and responsibilities of gtm object
gtm_coordinates = gtm.matMeans
gtm_modes = gtm.matModes
gtm_responsibilities = gtm.matR

绘制HTML地图

在gtm对象上调用plot_html()函数:

#run model on train
gtm = ugtm.runGTM(train)

# ex. plot gtm object with landscape, html: labels are continuous
gtm.plot_html(output="testout10",labels=activity,discrete=False,pointsize=20)

# ex. plot gtm object with landscape, html: labels are discrete
gtm.plot_html(output="testout11",labels=labels,discrete=True,pointsize=20)

# ex. plot gtm object with landscape, html: labels are continuous
# no interpolation between nodes
gtm.plot_html(output="testout12",labels=activity,discrete=False,pointsize=20, \
              do_interpolate=False,ids=labels)

# ex. plot gtm object with landscape, html: labels are discrete,
# no interpolation between nodes
gtm.plot_html(output="testout13",labels=labels,discrete=True,pointsize=20, \
              do_interpolate=False)

绘制pdf地图

对gtm对象调用plot()函数:

#run model on train
gtm = ugtm.runGTM(train)

# ex. plot gtm object, pdf: no labels
gtm.plot(output="testout6",pointsize=20)

# ex. plot gtm object with landscape, pdf: labels are discrete
gtm.plot(output="testout7",labels=labels,discrete=True,pointsize=20)

# ex. plot gtm object with landscape, pdf: labels are continuous
gtm.plot(output="testout8",labels=activity,discrete=False,pointsize=20)

绘制多面板视图

对gtm对象调用plot_multipanel()函数。 这将绘制一个通用模型视图,显示有或无点的方式、模式、景观。 plot_multipanel函数仅在定义了标签时才起作用:

#run model on train
gtm = ugtm.runGTM(train)

# ex. with discrete labels and inter-node interpolation
gtm.plot_multipanel(output="testout2",labels=labels,discrete=True,pointsize=20)

# ex. with continuous labels and inter-node interpolation
gtm.plot_multipanel(output="testout3",labels=activity,discrete=False,pointsize=20)

# ex. with discrete labels and no inter-node interpolation
gtm.plot_multipanel(output="testout4",labels=labels,discrete=True,pointsize=20, \
                    do_interpolate=False)

# ex. with continuous labels and no inter-node interpolation
gtm.plot_multipanel(output="testout5",labels=activity,discrete=False,pointsize=20, \
                    do_interpolate=False)
< H3>将新数据投影到现有GTM映射< EH3>

使用transform()函数可以将新数据投影到gtm映射上,该函数将gtm模型、训练集和测试集作为输入。然后,列车组仅用于对基于列车的测试集执行数据预处理(例如:在运行算法之前,对列车和测试集应用相同的PCA转换):

#run model on train
gtm = ugtm.runGTM(train,doPCA=True)

#test new data (test)
transformed=ugtm.transform(optimizedModel=gtm,train=train,test=test,doPCA=True)

#plot transformed test (html)
transformed.plot_html(output="testout14",pointsize=20)

#plot transformed test (pdf)
transformed.plot(output="testout15",pointsize=20)

#plot transformed data on existing classification model,
#using training set labels
gtm.plot_html_projection(output="testout16",projections=transformed,\
                         labels=labels, \
                         discrete=True,pointsize=20)

7号。测试集的输出预测:gtm回归(gtr)和分类(gtc)

gtr()函数实现gtm回归模型(参考文献)和gtc()函数实现gtm分类模型(参考文献):

#continuous labels (prediction by GTM regression model)
predicted=ugtm.GTR(train=train,test=test,labels=activity)

#discrete labels (prediction by GTM classification model)
predicted=ugtm.GTC(train=train,test=test,labels=labels)

8个。具有每类概率的高级GTM预测

测试集的每类概率可以由advancedGTC()函数给出(您可以像使用rungtm一样设置m、k、regul、s参数):

#get whole output model and label predictions for test set
predicted_model=ugtm.advancedGTC(train=train,test=test,labels=labels)

#write whole predicted model with per-class probabilities
ugtm.printClassPredictions(predicted_model,"testout17")

9号。交叉验证实验

通过不同的交叉验证实验,将gtc和gtr模型与经典的机器学习方法进行了比较:

#crossvalidation experiment: GTM classification model implemented in ugtm,
#here: set hyperparameters s=1 and regul=1 (set to -1 to optimize)
ugtm.crossvalidateGTC(data=train,labels=labels,s=1,regul=1,n_repetitions=10,n_folds=5)

#crossvalidation experiment: GTM regression model
ugtm.crossvalidateGTR(data=train,labels=activity,s=1,regul=1)

#you can also run the following functions to compare
#with other classification/regression algorithms:

#crossvalidation experiment, k-nearest neighbours classification
#on 2D PCA map with 7 neighbors (set to -1 to optimize number of neighbours)
ugtm.crossvalidatePCAC(data=train,labels=labels,n_neighbors=7)

#crossvalidation experiment, SVC rbf classification model (sklearn implementation):
ugtm.crossvalidateSVCrbf(data=train,labels=labels,C=1,gamma=1)

#crossvalidation experiment, linear SVC classification model (sklearn implementation):
ugtm.crossvalidateSVC(data=train,labels=labels,C=1)

#crossvalidation experiment, linear SVC regression model (sklearn implementation):
ugtm.crossvalidateSVR(data=train,labels=activity,C=1,epsilon=1)

#crossvalidation experiment, k-nearest neighbours regression on 2D PCA map with 7 neighbors:
ugtm.crossvalidatePCAR(data=train,labels=activity,n_neighbors=7)

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
基于Java的遗传算法确定最优交易行为   java改型2.0无法解析Json嵌套对象   java在数组中查找最大额定值(数字),我们不能跳过数组中的一个或多个连续数字   java在spring boot中从命令行设置活动概要文件和配置位置   JavaAxis2:传输错误:404错误:未找到帮助理解其真正含义   java使用Play2WAR和Play2.2.1   java理解函数运算符:Lambda   在代理java后面读取https网页数据   java应用程序。Android单元测试中的类mock   java为什么onClick布局XML引用的方法需要是公共的?   从SMTLIB2文件解析的java显示声明   java重写给定的类以使用组合而不是继承   HTMLUnit和Java:NoSuchMethodException:createDefaultSSLContext()   java如何使用Spring和ThymeLeaf从前端正确更新后端中的对象?   来自init()Java的方法调用   使用cellrendering从数据库向JTable动态添加数据后,java无法使用JTable执行排序操作   java Android Studio 1.5.1。渲染错误(浮动操作按钮)   web服务如何使用UsenameToken和PasswordDigest为JAVA中的SOAP客户端附加wsse安全头   java为什么要在局部变量和myApplicationClass中同时删除“ArrayList.remove”?