kNN预测不同于距离值预测

# fit the classifier >>> y = array(['financial_services', 'health_care', 'information_technology'], dtype=object) >>> X.shape = (3L, 571L) neigh = KNeighborsClassifier(n_neighbors=3) neigh.fit(X, y) # predict the result for some website (predict is a matrix with my features) print(neigh.predict(predict)) >>> ['financial_services'] # predict the first category print(neigh.kneighbors(predict)) # get the "distances" to each category >>> (array([[ 2323819.25162006, 2323841.23289028, 2323852.69883011]]), array([[2, 0, 1]], dtype=int64)) # we can see that this website is closer to the category #2, which is IT

1条回答

网友

1楼 · 发布于 2024-06-02 05:22:35

根据sklearn文档，从kneighbours返回的index数组为您提供了“总体矩阵中最近点的索引”，而不是类标签。距离从最近到最远排序，索引告诉您哪个距离对应于训练集中的哪个实例。因此[[ 2323819.25162006, 2323841.23289028, 2323852.69883011]], [[2, 0, 1]]告诉您第3个（索引2）实例是距离2323819.25162006最近的实例。这可能有点令人困惑，因为在训练数据和k=3中正好有3个点。这里要记住的是，索引引用的是原始训练样本数组中的索引，而不是同一函数返回的距离数组。在

我认为你的主要问题是人口矩阵包含了3个3个类的例子（每个类1个）。当您设置kneighbours=3时，您认为分类器试图做什么？找到一个测试点的三个最近的邻居，但是只有三个例子，它们都有不同的类。在

用户指南有这样的说法

Classification is computed from a simple majority vote of the nearest neighbors of each point

在你的例子中，最接近的3个点。同样只有三点，而且他们都有不同的等级，所以多数票永远无法正常运作。在

相关问题更多 >

编程相关推荐

热门问题

热门文章