在Python中有效找到大于阈值的首个样本（并比较MATLAB）

5 投票

3 回答

8281 浏览

提问于 2025-04-18 07:35

我想要找到一个列表或数组中，信号第一次超过某个特定值（我们称之为threshold）的样本，而不是找出所有超过这个值的样本。因为信号可能会多次超过这个值。比如说，如果我有一个信号：

signal = [1, 2, 3, 4, 4, 3, 2, 1, 0, 3, 2, 1, 0, 0, 1, 1, 4, 8, 7, 6, 5, 0]

而threshold = 2，那么

signal = numpy.array(signal)
is_bigger_than_threshold = signal > threshold

会给我所有在signal中大于threshold的值。

但是，我只想要信号第一次超过这个值的样本。因此，我需要遍历整个列表，进行布尔比较，比如：

first_bigger_than_threshold = list()
first_bigger_than_threshold.append(False)
for i in xrange(1, len(is_bigger_than_threshold)):
    if(is_bigger_than_threshold[i] == False):
        val = False
    elif(is_bigger_than_threshold[i]):
        if(is_bigger_than_threshold[i - 1] == False):
            val = True
        elif(is_bigger_than_threshold[i - 1] == True):
            val = False
    first_bigger_than_threshold.append(val)

这样可以得到我想要的结果，也就是：

[False, False, True, False, False, False, False, False, False, True, False, False, False,   
False, False, False, True, False, False, False, False, False]

在MATLAB中，我会用类似的方法来做：

for i = 2 : numel(is_bigger_than_threshold)
    if(is_bigger_than_threshold(i) == 0)
        val = 0;
    elseif(is_bigger_than_threshold(i))
        if(is_bigger_than_threshold(i - 1) == 0)
            val = 1;
        elseif(is_bigger_than_threshold(i - 1) == 1)
            val = 0;
        end
    end
    first_bigger_than_threshold(i) = val;
end % for

有没有更高效（更快）的方法来进行这个计算呢？

如果我在Python中生成数据，比如：

signal = [round(random.random() * 10) for i in xrange(0, 1000000)]

并进行计时，计算这些值花了4.45秒。如果我在MATLAB中生成数据：

signal = round(rand(1, 1000000) * 10);

并执行程序，只需要0.92秒。

为什么MATLAB在执行这个任务时比Python快将近5倍呢？

谢谢大家的评论！

性能优化 matlab 信号处理编程效率数组处理阈值比较数据生成布尔比较

3 个回答

根据一个观点，想要加快速度，最好的办法就是选择最合适的算法。这里有一个简单的边缘检测器，可以做到这一点：

import numpy

signal = numpy.array([1, 2, 3, 4, 4, 3, 2, 1, 0, 3, 2, 1, 0, 0, 1, 1, 4, 8, 7, 6, 5, 0])

thresholded_data = signal > threshold
threshold_edges = numpy.convolve([1, -1], thresholded_data, mode='same')

thresholded_edge_indices = numpy.where(threshold_edges==1)[0]

print(thresholded_edge_indices)

这个代码会输出 [2 9 16]，这些数字代表在一个序列中，第一个超过阈值的元素的位置。这种方法在Matlab和Python（使用Numpy）中都能让处理速度更快——在我的电脑上，这样做大约只需要12毫秒，而你之前的方法需要4.5秒。

补充说明：正如@eickenberg指出的，卷积操作可以用 numpy.diff(thresholded_data) 来替代，这样做在概念上更简单。不过要注意，这样得到的位置会少1，所以记得要把这个补回来。同时，还需要把 thresholded_data 转换成整数数组，使用 thresholded_data.astype(int)。这两种方法在速度上没有明显的差别。

回答于 2025-04-18 由 Python大师

分享举报

这篇文章解释了为什么你的代码比Matlab慢。

试试这个代码

import numpy as np

threshold = 2
signal = np.array([1, 2, 3, 4, 4, 3, 2, 1, 0, 3, 2, 1, 0, 0, 1, 1, 4, 8, 7, 6, 5, 0])

indices_bigger_than_threshold = np.where(signal > threshold)[0] # get item
print indices_bigger_than_threshold
# [ 2  3  4  5  9 16 17 18 19 20]
non_consecutive = np.where(np.diff(indices_bigger_than_threshold) != 1)[0]+1 # +1 for selecting the next
print non_consecutive
# [4 5]
first_bigger_than_threshold1 = np.zeros_like(signal, dtype=np.bool)
first_bigger_than_threshold1[indices_bigger_than_threshold[0]] = True # retain the first
first_bigger_than_threshold1[indices_bigger_than_threshold[non_consecutive]] = True

np.where 会返回符合条件的索引。

这个方法的思路是找出大于 threshold 的索引，并去掉连续的部分。

顺便说一下，欢迎来到Python/Numpy的世界。

回答于 2025-04-18 由 Python大师

分享举报

其他的回答给出了第一个为真的位置，如果你想要一个布尔数组来标记第一个为真，你可以更快地做到这一点：

import numpy as np

signal = np.random.rand(1000000)
th = signal > 0.5
th[1:][th[:-1] & th[1:]] = False

回答于 2025-04-18 由 Python大师

分享举报

在Python中有效找到大于阈值的首个样本（并比较MATLAB）

3 个回答

撰写回答