Python clustermatch包_程序模块 - PyPI

处理高度多样化数据的高效聚类方法

clustermatch的Python项目详细描述

群集匹配

title:clustermatch:在高度多样化的定性和定量数据中发现隐藏的关系，而没有标准化
作者：米尔顿·皮维多里、安德烈斯·塞纳达斯、路易斯·德哈罗、费尔南多·卡拉里、乔治娜·斯特格迈尔和迭戈·米隆

信号、系统和计算智能研究所

*通讯作者：mpividori@sinc.unl.edu.ar

说明

clustermatch是一种处理高度多样性的有效聚类方法数据。它可以处理非常不同的数据类型（例如在存在线性或非线性关系的情况下，噪音，无需任何预先处理。文章描述方法已发送以供发布。

如果要快速测试ClusterMatch，可以从 here。

镜子：

Github:https://github.com/sinc-lab/clustermatch
位桶（Mercurial）：https://bitbucket.org/sinc-lab/clustermatch

安装

您可以通过运行：

$ pip install clustermatch

这将安装一个命令行实用程序（运行clustermatch -h获取用法说明）它被认为是阿尔法，而且还在开发中。遵照指示如果要创建自己的环境并使用python api运行群集匹配。

clustermatch适用于python 3.6（它也应该适用于3.5版）。你还需要一个c编译器（如gcc）来安装minepy并运行模拟，尽管没有必要使用clustermatch。在ubuntu中你可以安装gcc 通过运行：

$ sudo apt-get install build-essential

建议使用 Anaconda/Miniconda 分配。安装conda后，移动到clustermatch所在的文件夹已解包，请执行以下步骤：

$ conda env create -n cm -f environment.yml
$ conda activate cm

这将创建名为cm的conda环境。最后一步激活环境。您可以运行测试套件以确保系统：

$ python -m unittest discover .
......................................................................

Ran 92 tests in 47.056s

OK

请记住，如果你想在手稿中完全重现结果，然后需要使用该文件安装完整的环境 environment_full.yml，这有额外的依赖性。我们用的那个 before（environment.yml）具有运行所需的最小包集群集匹配。

再现结果

你可以通过使用一个具有若干线性和非线性变换的人工数据集查看方法的行为（用路径替换{CLUSTERMATCH_FOLDER}）在clusterMatch文件夹中）：

$ exportPYTHONPATH={CLUSTERMATCH_FOLDER}
$ cd{CLUSTERMATCH_FOLDER}/experiments
$ python main.py --data-transf transform_rows_nonlinear03 --noise-perc 45 --n-jobs 4 --n-reps 1 --n-features 50
Running now:
{"clustering_algorithm": "spectral",
  "clustering_metric": "ari",
  "data_generator": "Blobs (data_seed_mode=False). n_features=50, n_samples=1000, centers=3, cluster_std=0.10, center_box=(-1.0, 1.0)",
  "data_noise": {"magnitude": 0.0,
    "percentage_measures": 0.0,
    "percentage_objects": 0.45
  },
  "data_transform": "Nonlinear row transformation 03. 10 simulated data sources; Functions: x^4, log, exp2, 100, log1p, x^5, 10000, log10, 0.0001, log2",
  "k_final": null,
  "n_reps": 1}

脚本的参数是：数据转换函数（--data-transf transform_rows_nonlinear03）、噪声百分比（--noise-perc 45）、使用的核心（--n-jobs 4）和重复次数（--n-reps 1）。我们只使用1 重复和50个特征（--n-features 50），以加快实验。如果你想完全运行这个实验手稿（图3），使用此命令（对于所有噪声级）：

python main.py --data-transf transform_rows_nonlinear03 --noise-perc 45 --n-jobs 4 --n-reps 20

完成后，您将在目录中找到输出 results_transform_rows_nonlinear03_0.45/{TIMESTAMP}/：

$ cat results_transform_rows_nonlinear03_0.45/20180829_161133/output000.txt

[...]

method              ('metric', 'mean')('metric', 'std')('time', 'mean')
----------------  --------------------  -------------------  ------------------
00. Clustermatch                  1.00                  nan               31.56
01. SC-Pearson                    0.11                  nan                0.33
02. SC-Spearman                   0.29                  nan                0.67
03. SC-DC                         0.45                  nan               37.19
04. SC-MIC                        0.88                  nan               45.73

用法

如果安装了命令行实用程序（clustermatch），则可以如下方式运行它：

$ cd{CLUSTERMATCH_FOLDER}
$ clustermatch -i experiments/tomato/data/real_sample.xlsx -k 3 -o partition.xls

文件partition.xls将包含数据的分区（real_sample.xlsx）。查看帮助（clustermatch -h）了解更多选项。

您还可以通过加载手稿。为此，请遵循以下说明：

$ cd{CLUSTERMATCH_FOLDER}
$ ipython

In[1]:fromutils.dataimportmerge_sourcesIn[2]:fromclustermatch.clusterimportcalculate_simmatrix,get_partition_spectralIn[3]:data_files=['experiments/tomato/data/real_sample.xlsx']In[4]:merged_sources,feature_names,sources_names=merge_sources(data_files)In[5]:cm_sim_matrix=calculate_simmatrix(merged_sources,n_jobs=4)In[6]:partition=get_partition_spectral(cm_sim_matrix,3)

变量partition将对已指定群集（在本例中为3）。可以指定多个输入数据通过填写列表data_files来创建文件。

ClusterMatch能够处理不同的数据类型（数字、序数或无需先前的预处理。电流如果变量包含文本，则实现将其视为分类变量。这个休息，N数数和序数，都是以类似的方式处理的，所以负责对有序变量进行适当编码（例如， low、normal和high可以编码为0、1、2；否则，如果保留为文本，将被视为分类的）。

欢迎加入QQ群-->： 979659372

clustermatch 0.1.4a3

clustermatch的Python项目详细描述

群集匹配

说明

安装

再现结果

用法

推荐PyPI第三方库

sparqlclient

awscdkawsecr

salted

ircparse

pythonsynolog

mysql-pydump

django-mssql-backend-azure

vercel

tools-barebone

keios-dynabuffers-sol

inaccel-vitis

cheermeup

drfaccesspolic

depth-to-mesh

pypi-sample

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

clustermatch 0.1.4a3

clustermatch的Python项目详细描述

群集匹配

说明

安装

再现结果

用法

推荐PyPI第三方库

sparqlclient

awscdkawsecr

salted

ircparse

pythonsynolog

mysql-pydump

django-mssql-backend-azure

vercel

tools-barebone

keios-dynabuffers-sol

inaccel-vitis

cheermeup

drfaccesspolic

depth-to-mesh

pypi-sample

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签