在一个大数组中识别以最大距离分隔的Python数组单元对？

import numpy as np import matplotlib.pyplot as plt # Example 2-D habitat array (1 = data) example_array = np.array([[0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0], [0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1], [0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0], [1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1], [1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1], [1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0], [1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0], [1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0], [1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]) # Plot example array plt.imshow(example_array, cmap="spectral", interpolation='nearest')

1条回答

网友

1楼 · 发布于 2024-05-16 08:14:25

我得承认我的妞妞很弱也许有办法直接做。尽管如此，这个问题在纯Python中并不困难。下面的代码将输出匹配数据的x/y坐标对。有很多潜在的优化可能会模糊代码并使代码运行得更快，但是考虑到数据集的大小和示例radius的大小（2.0），我怀疑其中任何一个都是值得的（除了在数组中创建numy视图而不是子列表）。在

更新了代码已经修复了几个错误（1）它在低于起点的行上看得太远，以及（2）它在左边缘附近没有做正确的事情。函数的调用现在使用半径为2.5来显示如何获取额外的对。在

example_array = [[0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0],
                [0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1],
                [0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1],
                [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
                [1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1],
                [1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1],
                [1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1],
                [1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0],
                [1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0],
                [1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
                [1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
                [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]

def findpairs(mylist, radius = 2.0):
    """
    Find pairs with data within a given radius.
    If we work from the top of the array down, we never
    need to look up (because we already would have found
    those, and we never need to look left on the same line.
    """

    # Create the parameters of a half circle, which is
    # the relative beginning and ending X coordinates to
    # search for each Y line starting at this one and
    # working down.  To avoid duplicates and extra work,
    # not only do we not look up, we never look left on
    # the same line as what we are matching, but we do
    # on subsequent lines.

    semicircle = []
    x = 1
    while x:
        y = len(semicircle)
        x = int(max(0, (radius ** 2 - y ** 2)) ** 0.5)
        # Don't look back on same line...
        semicircle.append((-x if y else 1, x + 1))

    # The maximum number of y lines we will search
    # at a time.
    max_y = len(semicircle)

    for y_start in range(len(mylist)):
        sublists = enumerate(mylist[y_start:y_start + max_y], y_start)
        sublists = zip(semicircle, sublists)
        check = (x for (x, value) in enumerate(mylist[y_start]) if value)
        for x_start in check:
            for (x_lo, x_hi), (y, ylist) in sublists:
                # Deal with left edge problem
                x_lo = max(0, x_lo + x_start)
                xlist = ylist[x_lo: x_start + x_hi]
                for x, value in enumerate(xlist, x_lo):
                    if value:
                        yield (x_start, y_start), (x, y)

print(list(findpairs(example_array, 2.5)))

执行时间将高度依赖于数据。对于grins，我创建了您指定的大小（13500 x 12000）的数组来测试计时。我使用了一个更大的半径（3.0而不是2.0），并尝试了两种情况：没有匹配，每个匹配。为了避免一次又一次地重新分配列表，我只需运行迭代器并抛出结果。代码如下。对于一个best case（空）数组，它在我的机器上运行了7秒；对于最坏情况（all 1s）数组，时间大约是12分钟。在

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章