无监督学习 - 在numpy数组内对numpy数组进行聚类 - 问答

无监督学习 - 在numpy数组内对numpy数组进行聚类

2024-04-25 16:48:30 发布

男 | 程序猿一只，喜欢编程写python代码。

我们正在处理一个语音数据集。波形转换为MFCC值。每行（wavfile）由大约20到40个（取决于声音文件的长度）数组组成，每个数组中有13个浮点值。这项任务的目标是识别10个语音数字。因为我们没有标签，我们想用一种学习方法把它们分成10组。你知道吗

代码如下所示：

def kmeans(data, k=3, normalize=False, limit= 500):
    """Basic k-means clustering algorithm.
    """
    # optionally normalize the data. k-means will perform poorly or strangely if the dimensions
    # don't have the same ranges.
    if normalize:
        stats = (data.mean(axis=0), data.std(axis=0))
        data = (data - stats[0]) / stats[1]

    # pick the first k points to be the centers. this also ensures that each group has at least
    # one point.
    centers = data[:k]

    for i in range(limit):
        # core of clustering algorithm...
        # first, use broadcasting to calculate the distance from each point to each center, then
        # classify based on the minimum distance.
        classifications = np.argmin(((data[:, :, None] - centers.T[None, :, :])**2).sum(axis=1), axis=1)
        # next, calculate the new centers for each cluster.
        new_centers = np.array([data[classifications == j, :].mean(axis=0) for j in range(k)])

        # if the centers aren't moving anymore it is time to stop.
        if (new_centers == centers).all():
            break
        else:
            centers = new_centers
    else:
        # this will not execute if the for loop exits on a break.
        raise RuntimeError(f"Clustering algorithm did not complete within {limit} iterations")

    # if data was normalized, the cluster group centers are no longer scaled the same way the original
    # data is scaled.
    if normalize:
        centers = centers * stats[1] + stats[0]

    print(f"Clustering completed after {i} iterations")

    return classifications, centers


classifications, centers = kmeans(speechdata, k=5)
plt.figure(figsize=(12, 8))
plt.scatter(x=speechdata[:, 0], y=speechdata[:, 1], s=100, c=classifications)
plt.scatter(x=centers[:, 0], y=centers[:, 1], s=500, c='k', marker='^')

行“classifications，centers=kmeans（speechdata，k=5）”给出了一个错误：IndexError:数组的索引太多。你知道吗

我如何转换数组数据的数组，长度不同（一行有形状（20,13），一行可能有形状（38,13），这样我就可以对它们进行聚类？你知道吗

Tags： the to new for data if stats 数组

0条回答

目前没有回答

无监督学习 - 在numpy数组内对numpy数组进行聚类

相关问题更多 >

编程相关推荐

热门问题

热门文章

无监督学习 - 在numpy数组内对numpy数组进行聚类

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >