从2D numpy数组中删除运行

2024-03-29 07:00:26 发布

您现在位置:Python中文网/ 问答频道 /正文

给定一个2D numpy数组:

00111100110111
01110011000110
00111110001000
01101101001110

有没有一种有效的方法来替换长度为1的运行?在

例如,如果N=3

^{pr2}$

实际上,2D数组是二进制的,我想用0代替1的运行,但是为了清楚起见,我在上面的例子中用2替换它们。在

可运行示例:http://runnable.com/U6q0q-TFWzxVd_Uf/numpy-replace-runs-for-python

我目前使用的代码看起来有点老套,我觉得可能有一些神奇的纽姆方法来做到这一点:

更新:我知道我将示例更改为不处理转角情况的版本。这是一个小的实现错误(现在已修复)。我更感兴趣的是有没有一种更快的方法来做这件事。在

import numpy as np
import time

def replace_runs(a, search, run_length, replace = 2):
  a_copy = a.copy() # Don't modify original
  for i, row in enumerate(a):
    runs = []
    current_run = []
    for j, val in enumerate(row):
      if val == search:
        current_run.append(j)
      else:
        if len(current_run) >= run_length or j == len(row) -1:
          runs.append(current_run)
        current_run = []

    if len(current_run) >= run_length or j == len(row) -1:
      runs.append(current_run)

    for run in runs:
      for col in run:
        a_copy[i][col] = replace

  return a_copy

arr = np.array([
  [0,0,1,1,1,1,0,0,1,1,0,1,1,1],
  [0,1,1,1,0,0,1,1,0,0,0,1,1,0],
  [0,0,1,1,1,1,1,0,0,0,1,0,0,0],
  [0,1,1,0,1,1,0,1,0,0,1,1,1,0],
  [1,1,1,1,1,1,1,1,1,1,1,1,1,1],
  [0,0,0,0,0,0,0,0,0,0,0,0,0,0],
  [1,1,1,1,1,1,1,1,1,1,1,1,1,0],
  [0,1,1,1,1,1,1,1,1,1,1,1,1,1],
])

print arr
print replace_runs(arr, 1, 3)

iterations = 100000

t0 = time.time()
for i in range(0,iterations):
  replace_runs(arr, 1, 3)
t1 = time.time()

print "replace_runs: %d iterations took %.3fs" % (iterations, t1 - t0)

输出:

[[0 0 1 1 1 1 0 0 1 1 0 1 1 1]
 [0 1 1 1 0 0 1 1 0 0 0 1 1 0]
 [0 0 1 1 1 1 1 0 0 0 1 0 0 0]
 [0 1 1 0 1 1 0 1 0 0 1 1 1 0]
 [1 1 1 1 1 1 1 1 1 1 1 1 1 1]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [1 1 1 1 1 1 1 1 1 1 1 1 1 0]
 [0 1 1 1 1 1 1 1 1 1 1 1 1 1]]

[[0 0 2 2 2 2 0 0 1 1 0 2 2 2]
 [0 2 2 2 0 0 1 1 0 0 0 2 2 0]
 [0 0 2 2 2 2 2 0 0 0 1 0 0 0]
 [0 1 1 0 1 1 0 1 0 0 2 2 2 0]
 [2 2 2 2 2 2 2 2 2 2 2 2 2 2]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [2 2 2 2 2 2 2 2 2 2 2 2 2 0]
 [0 2 2 2 2 2 2 2 2 2 2 2 2 2]]

replace_runs: 100000 iterations took 14.406s

Tags: 方法runinnumpyforlentimeruns
3条回答

我将把输入看作一维数组,因为它一般化为二维。在

在二进制中,可以使用&检查两个项是否都是1。在numpy中,可以通过切片有效地“移动”数组。因此,创建第二个数组,其中所有要取消设置(或更改为两个)的位置都有一个1。然后^或{}这取决于你想把它们变成0还是2:

def unset_ones(a, n):
    match = a[:-n].copy()
    for i in range(1, n): # find 1s that have n-1 1s following
        match &= a[i:i-n]
    matchall = match.copy()
    matchall.resize(match.size + n)
    for i in range(1, n): # make the following n-1 1s as well
        matchall[i:i-n] |= match
    b = a.copy()
    b ^= matchall # xor into the original data; replace by + to make 2s
    return b

示例:

^{pr2}$

通过卷积使用模式匹配:

def replace_runs(a, N, replace = 2):
    a_copy = a.copy()
    pattern = np.ones(N, dtype=int)
    M = a_copy.shape[1]

    for i, row in enumerate(a_copy):
        conv = np.convolve(row, pattern, mode='same')
        match = np.where(conv==N)

        a_copy[i][match]=replace
        a_copy[i][match[0][match[0]-1>0]-1]=replace
        a_copy[i][match[0][match[0]+1<M]+1]=replace
    return a_copy

比原始的replace_runs慢3倍,但可以检测出角点情况(如提议的基于字符串的方法)。在

在我的机器上:

replace_runs_org:100000次迭代耗时12.792s

replace_runs_var:100000次迭代耗时33.112秒

首先,你的代码不能正常工作。。。它用2s替换第二行末尾只有两个1的集群。也就是说,以下内容符合你的文字描述:

def replace_runs_bis(arr, search=1, n=3, val=2):
    ret = np.array(arr) # this makes a copy by default
    rows, cols = arr.shape
    # Fast convolution with an all 1's kernel
    arr_cum = np.cumsum(arr == search, axis=1)
    arr_win = np.empty((rows, cols-n+1), dtype=np.intp)
    arr_win[:, 0] = arr_cum[:, n-1]
    arr_win[:, 1:] = arr_cum[:, n:] - arr_cum[:, :-n]
    mask_win = arr_win >= n
    # mask_win is True for n item windows all full of searchs, expand to pixels
    mask = np.zeros_like(arr, dtype=np.bool)
    for j in range(n):
        sl_end = -n+j+1
        sl_end = sl_end if sl_end else None
        mask[:, j:sl_end] |= mask_win
    #replace values
    ret[mask] = val

    return ret

对于您的示例数组,它快了大约2倍,但是我猜如果n保持较小,那么对于较大的数组它会快得多。在

^{pr2}$

相关问题 更多 >