拓扑数据分析-从何开始

2024-05-18 20:47:05 发布

您现在位置:Python中文网/ 问答频道 /正文

我最近发现“拓扑数据分析”(TDA)是一种可视化大型数据集的独特方法。这是斯坦福大学的一篇论文,在https://research.math.osu.edu/tgda/mapperPBG.pdf结尾处有示例输出。

我想产生类似的结果,但是在安装包、加载示例数据,然后执行几行(比如http://scikit-learn.org/示例)的网络上很难找到可运行的代码。我的语言偏好是Python,但也可以使用R。

有没有人能够得到TDA的牵引力,如果有,对如何启动和运行代码有什么建议?


Tags: 数据方法代码https示例pdf可视化math
3条回答

对于可视化,Cytoscape有桌面和浏览器版本。

它建议生成两个python库(Bioconductor和igraph)here

有一个新的r包:

TDA: Statistical Tools for Topological Data Analysis
该软件包为持久同源性的统计分析和密度聚类提供了工具。

在这里可以找到写得很好的小插曲:Introduction to the R package TDA

摘要

We present a short tutorial and introduction to using the R package TDA, which provides some tools for Topological Data Analysis. In particular, it includes implementations of functions that, given some data, provide topological information about the underlying space, such as the distance function, the distance to a measure, the kNN density estimator, the kernel density estimator, and the kernel distance. The salient topological features of the sublevel sets (or superlevel sets) of these functions can be quantified with persistent homology. We provide an R interface for the efficient algorithms of the C++ libraries GUDHI, Dionysus and PHAT, including a function for the persistent homology of the Rips filtration, and one for the persistent homology of sublevel sets (or superlevel sets) of arbitrary functions evaluated over a grid of points. The significance of the features in the resulting persistence diagrams can be analyzed with functions that implement the methods discussed in Fasy, Lecci, Rinaldo, Wasserman, Balakrishnan, and Singh (2014), Chazal, Fasy, Lecci, Rinaldo, and Wasserman (2014c) and Chazal, Fasy, Lecci, Michel, Rinaldo, and Wasserman (2014a). The R package TDA also includes the implementation of an algorithm for density clustering, which allows us to identify the spatial organization of the probability mass associated to a density function and visualize it by means of a dendrogram, the cluster tree.

狄俄尼索斯是C++实现持久同调的一种实现方法。它有一个很好的PyBind包装器,这使得在python中使用它非常容易。

最近,狄俄尼索斯版本2出现了,它具有绘图功能,这应该使它更容易深入。看看这里:

http://www.mrzv.org/software/dionysus2/tutorial/plotting.html

从欧几里德空间中的通用数据集(例如,二维或三维数组),构建Rips复合体可能是一个很好的切入点,这里对此进行了解释:

http://www.mrzv.org/software/dionysus2/tutorial/rips.html

相关问题 更多 >

    热门问题