使用scipy.spatial时的数据类型问题

6 投票

1 回答

4707 浏览

提问于 2025-04-16 05:05

我想用scipy.spatial里的KDTree来找一个二维数组中最近的邻居对（其实就是一个列表里的列表，里面的每个小列表有两个元素）。我生成了我的列表，然后把它转成numpy的数组，接着创建了KDTree的实例。但是，每次我尝试在上面运行“query”时，得到的结果总是很奇怪。例如，当我输入：

tree = KDTree(array)
nearest = tree.query(np.array[1,1])

nearest打印出来的是(0.0, 0)。目前，我用的数组基本上是y = x，范围是(1,50)，所以我本来期待(1,1)的最近邻居是(2,2)。

我到底哪里出错了，scipy的高手们？

补充：另外，如果有人能推荐一个他们用过的Python的KDTree包，用来查找某个点的最近邻居，我会很感兴趣。

numpy scipy 数据类型二维数组 kdtree 最近邻居空间数据结构查询算法

1 个回答

我之前用过 scipy.spatial，感觉它比 scikits.ann 有了很大的改进，特别是在使用界面上。

在这个情况下，我觉得你可能搞混了你调用 tree.query(...) 的返回结果。根据 scipy.spatial.KDTree.query 的文档：

Returns
-------

d : array of floats
    The distances to the nearest neighbors.
    If x has shape tuple+(self.m,), then d has shape tuple if
    k is one, or tuple+(k,) if k is larger than one.  Missing
    neighbors are indicated with infinite distances.  If k is None,
    then d is an object array of shape tuple, containing lists
    of distances. In either case the hits are sorted by distance
    (nearest first).
i : array of integers
    The locations of the neighbors in self.data. i is the same
    shape as d.

所以当你查询离 [1,1] 最近的点时，你得到的是：

distance to nearest: 0.0
index of nearest in original array: 0

这意味着 [1,1] 是你原始数据中 array 的第一行，这个结果是可以预期的，因为你的数据是 y = x，范围在 [1,50] 之间。

scipy.spatial.KDTree.query 函数还有很多其他选项，比如如果你想确保得到的最近邻不是它自己，可以试试：

tree.query([1,1], k=2)

这样会返回两个最近的邻居，你可以进一步处理这些结果，比如在返回的距离为零的情况下（也就是查询的点是用来构建树的数据点之一），选择第二个最近的邻居，而不是第一个。

回答于 2025-04-16 由 Python大师

分享举报

使用scipy.spatial时的数据类型问题

1 个回答

撰写回答