Numpy：从点列表中提取坐标范围相同的点，避免使用for循环

Question

我遇到了一个关于numpy的挑战。

我有一组三维点的列表。每个点的前两个值是它在图像上的坐标，第三个值是与这个点相关的一些数值。

我想在图像上放一个二维网格，并建立一个数组，里面包含每个小块的平均值，这样每个小块都有一个平均值。

我已经用numpy实现了这个功能，但为了优化（其实我并不太需要，但现在我很好奇是否可以做到），我想找一种方法来去掉下面代码中的最后两个for循环，完全用numpy来实现。我觉得可以用np.mgrid，但我还没找到具体怎么用。你能帮我吗？

import numpy as np

points_range = np.array([2, 5, 1])
points = np.random.random((100, 3))  # points[:, i] will be referred as x, y, z for i in 0, 1, 2
points *= points_range

x_steps = 10  # We will divide the points x space in 10 columns
y_steps = 15  # We will divide the points y space in 15 rows
x_step_size = points_range[0] / x_steps
y_step_size = points_range[1] / y_steps

# Do something with that ???
X, Y = np.mgrid[0:points_range[0]:x_step_size, 0:points_range[1]:y_step_size]

# Try the same with numpy: WORK
means = [[0 for _ in range(y_steps)] for _ in range(x_steps)]
for x_step_id in range(x_steps):
    for y_step_id in range(y_steps):
        # Compute the z average for every points in the tile [x_step_id, y_step_id]
        # -> get points that are inside this tile
        p = points[(points[:, 0] >= x_step_size * x_step_id) &
                   (points[:, 0] < x_step_size * (x_step_id + 1)) &
                   (points[:, 1] >= y_step_size * y_step_id) &
                   (points[:, 1] < y_step_size * (y_step_id + 1))]
        # -> get z average on these points
        mean_z = np.mean(p[:, 2])
        means[x_step_id][y_step_id] = 0 if np.isnan(mean_z) else mean_z

解决方案

这段代码比较了三种版本及其运行时间。
在我电脑上运行这段代码的输出是（时间单位为秒，1000万个点，15*15的网格）：

Method for_loop_1 took 37.779709815979004
Method for_loop_2 took 14.891589879989624
Method for_plus_numpy took 16.47166681289673
Method full_numpy took 1.1438612937927246

代码：

import numpy as np
import time

points_range = np.array([2, 5, 1])
points = np.random.random((10000000, 3))  # points[:, i] will be referred as x, y, z for i in 0, 1, 2
points *= points_range

x_steps = 15  # We will divide the points x space in 10 columns
y_steps = 15  # We will divide the points y space in 15 rows
x_step_size = points_range[0] / x_steps
y_step_size = points_range[1] / y_steps


methods = []
def for_loop_1():
    start = time.time()
    means = [[0 for _ in range(y_steps)] for _ in range(x_steps)]
    n = np.zeros_like(means)
    for p in points:
        tile_x = int(p[0] // x_step_size)
        tile_y = int(p[1] // y_step_size)
        means[tile_x][tile_y] = (p[2] + n[tile_x][tile_y] * means[tile_x][tile_y]) / (n[tile_x][tile_y] + 1)
        n[tile_x][tile_y] += 1
    stop = time.time()
    return means, stop - start
methods.append(for_loop_1)

def for_loop_2():
    start = time.time()
    sums = np.zeros((x_steps, y_steps))
    counts = np.zeros((x_steps, y_steps))
    for point in points:
        tile_index_x = int(point[0] // x_step_size)
        tile_index_y = int(point[1] // y_step_size)
        sums[tile_index_x, tile_index_y] += point[2]
        counts[tile_index_x, tile_index_y] += 1
    means = np.where(counts > 0, sums / counts, 0)
    stop = time.time()
    return means, stop - start
methods.append(for_loop_2)

# Try the same with numpy
def for_plus_numpy():  # My method
    start = time.time()
    means = [[0 for _ in range(y_steps)] for _ in range(x_steps)]
    for x_step_id in range(x_steps):
        for y_step_id in range(y_steps):
            # Compute the z average for every points in the tile [x_step_id, y_step_id]
            # -> get points that are inside this tile
            p = points[(points[:, 0] >= x_step_size * x_step_id) &
                       (points[:, 0] < x_step_size * (x_step_id + 1)) &
                       (points[:, 1] >= y_step_size * y_step_id) &
                       (points[:, 1] < y_step_size * (y_step_id + 1))]
            # -> get z average on these points
            mean_z = np.mean(p[:, 2])
            means[x_step_id][y_step_id] = 0 if np.isnan(mean_z) else mean_z
    stop = time.time()
    return means, stop - start
methods.append(for_plus_numpy)


def full_numpy():
    start = time.time()
    edges = np.linspace(0, points_range[0], x_steps+1), np.linspace(0, points_range[1], y_steps+1)
    sums, _, _ = np.histogram2d(points[:, 0], points[:, 1], edges, weights=points[:, 2])
    counts, _, _ = np.histogram2d(points[:, 0], points[:, 1], edges)
    means = np.where(counts > 0, sums / counts, 0)
    stop = time.time()
    return means, stop - start
methods.append(full_numpy)


for method in methods:
    means, duration = method()
    print("Method ", method.__name__ + " took ", duration, sep="")

性能优化数据处理 numpy 坐标提取平均值计算网格化矢量化 mgrid

Numpy：从点列表中提取坐标范围相同的点，避免使用for循环

解决方案

2 个回答

撰写回答