Numpy:从点列表中提取坐标范围相同的点,避免使用for循环

0 投票
2 回答
74 浏览
提问于 2025-04-14 18:36

我遇到了一个关于numpy的挑战。

我有一组三维点的列表。每个点的前两个值是它在图像上的坐标,第三个值是与这个点相关的一些数值。

我想在图像上放一个二维网格,并建立一个数组,里面包含每个小块的平均值,这样每个小块都有一个平均值。

我已经用numpy实现了这个功能,但为了优化(其实我并不太需要,但现在我很好奇是否可以做到),我想找一种方法来去掉下面代码中的最后两个for循环,完全用numpy来实现。我觉得可以用np.mgrid,但我还没找到具体怎么用。你能帮我吗?

import numpy as np

points_range = np.array([2, 5, 1])
points = np.random.random((100, 3))  # points[:, i] will be referred as x, y, z for i in 0, 1, 2
points *= points_range

x_steps = 10  # We will divide the points x space in 10 columns
y_steps = 15  # We will divide the points y space in 15 rows
x_step_size = points_range[0] / x_steps
y_step_size = points_range[1] / y_steps

# Do something with that ???
X, Y = np.mgrid[0:points_range[0]:x_step_size, 0:points_range[1]:y_step_size]

# Try the same with numpy: WORK
means = [[0 for _ in range(y_steps)] for _ in range(x_steps)]
for x_step_id in range(x_steps):
    for y_step_id in range(y_steps):
        # Compute the z average for every points in the tile [x_step_id, y_step_id]
        # -> get points that are inside this tile
        p = points[(points[:, 0] >= x_step_size * x_step_id) &
                   (points[:, 0] < x_step_size * (x_step_id + 1)) &
                   (points[:, 1] >= y_step_size * y_step_id) &
                   (points[:, 1] < y_step_size * (y_step_id + 1))]
        # -> get z average on these points
        mean_z = np.mean(p[:, 2])
        means[x_step_id][y_step_id] = 0 if np.isnan(mean_z) else mean_z

解决方案

这段代码比较了三种版本及其运行时间。
在我电脑上运行这段代码的输出是(时间单位为秒,1000万个点,15*15的网格):

Method for_loop_1 took 37.779709815979004
Method for_loop_2 took 14.891589879989624
Method for_plus_numpy took 16.47166681289673
Method full_numpy took 1.1438612937927246

代码:

import numpy as np
import time

points_range = np.array([2, 5, 1])
points = np.random.random((10000000, 3))  # points[:, i] will be referred as x, y, z for i in 0, 1, 2
points *= points_range

x_steps = 15  # We will divide the points x space in 10 columns
y_steps = 15  # We will divide the points y space in 15 rows
x_step_size = points_range[0] / x_steps
y_step_size = points_range[1] / y_steps


methods = []
def for_loop_1():
    start = time.time()
    means = [[0 for _ in range(y_steps)] for _ in range(x_steps)]
    n = np.zeros_like(means)
    for p in points:
        tile_x = int(p[0] // x_step_size)
        tile_y = int(p[1] // y_step_size)
        means[tile_x][tile_y] = (p[2] + n[tile_x][tile_y] * means[tile_x][tile_y]) / (n[tile_x][tile_y] + 1)
        n[tile_x][tile_y] += 1
    stop = time.time()
    return means, stop - start
methods.append(for_loop_1)

def for_loop_2():
    start = time.time()
    sums = np.zeros((x_steps, y_steps))
    counts = np.zeros((x_steps, y_steps))
    for point in points:
        tile_index_x = int(point[0] // x_step_size)
        tile_index_y = int(point[1] // y_step_size)
        sums[tile_index_x, tile_index_y] += point[2]
        counts[tile_index_x, tile_index_y] += 1
    means = np.where(counts > 0, sums / counts, 0)
    stop = time.time()
    return means, stop - start
methods.append(for_loop_2)

# Try the same with numpy
def for_plus_numpy():  # My method
    start = time.time()
    means = [[0 for _ in range(y_steps)] for _ in range(x_steps)]
    for x_step_id in range(x_steps):
        for y_step_id in range(y_steps):
            # Compute the z average for every points in the tile [x_step_id, y_step_id]
            # -> get points that are inside this tile
            p = points[(points[:, 0] >= x_step_size * x_step_id) &
                       (points[:, 0] < x_step_size * (x_step_id + 1)) &
                       (points[:, 1] >= y_step_size * y_step_id) &
                       (points[:, 1] < y_step_size * (y_step_id + 1))]
            # -> get z average on these points
            mean_z = np.mean(p[:, 2])
            means[x_step_id][y_step_id] = 0 if np.isnan(mean_z) else mean_z
    stop = time.time()
    return means, stop - start
methods.append(for_plus_numpy)


def full_numpy():
    start = time.time()
    edges = np.linspace(0, points_range[0], x_steps+1), np.linspace(0, points_range[1], y_steps+1)
    sums, _, _ = np.histogram2d(points[:, 0], points[:, 1], edges, weights=points[:, 2])
    counts, _, _ = np.histogram2d(points[:, 0], points[:, 1], edges)
    means = np.where(counts > 0, sums / counts, 0)
    stop = time.time()
    return means, stop - start
methods.append(full_numpy)


for method in methods:
    means, duration = method()
    print("Method ", method.__name__ + " took ", duration, sep="")

2 个回答

1

与其对每个方块都遍历整个列表,不如一次性遍历所有的点。对于每个点,检查它属于哪个方块,然后通过计算累积平均值来更新相应的平均值。这意味着,你只需要遍历一次列表。

import numpy as np

points_range = np.array([2, 5, 1])
points = np.random.random((100, 3))  # points[:, i] will be referred as x, y, z for i in 0, 1, 2
points *= points_range

x_steps = 10  # We will divide the points x space in 10 columns
y_steps = 15  # We will divide the points y space in 15 rows
x_step_size = points_range[0] / x_steps
y_step_size = points_range[1] / y_steps

means = np.zeros((x_steps, y_steps))
# keep track of current number of points per tile for running average
points_per_tile = np.zeros_like(means)

for point in points:
    # check which tile the points belongs to
    tile_index_x = int(point[0] // x_step_size)
    tile_index_y = int(point[1] // y_step_size)
    # get current mean and current N
    current_mean = means[tile_index_x, tile_index_y]
    current_points = points_per_tile[tile_index_x, tile_index_y]
    # get updated mean
    updated_mean = (current_mean * current_points + point[2]) / (current_points + 1)
    # update values
    means[tile_index_x, tile_index_y] = updated_mean
    points_per_tile[tile_index_x, tile_index_y] += 1

建议修改:

(通过这个修改,效率和原作者在100万点上的解决方案相似)

# keep track of current sum of points per tile for running average
sums = np.zeros((x_steps, y_steps))
# keep track of current number of points per tile for running average
counts = np.zeros((x_steps, y_steps))

for point in points:
    # check which tile the points belongs to
    tile_index_x = int(point[0] // x_step_size)
    tile_index_y = int(point[1] // y_step_size)
    # update
    sums[tile_index_x, tile_index_y] += point[2]
    counts[tile_index_x, tile_index_y] += 1

means = np.where(counts>0, sums/counts, 0)
2

这里有另一种方法,它更简洁,也更高效(在处理100万个点时大约快10倍)

edges = np.linspace(0,points_range[0],x_steps+1), np.linspace(0,points_range[1],y_steps+1)
sums, _, _ = np.histogram2d(points[:,0], points[:,1], edges, weights=points[:,2])
counts, _, _ = np.histogram2d(points[:,0], points[:,1], edges)
means = np.where(counts>0, sums/counts, 0)

撰写回答