<p>下面是一个使用数组切片来提高性能的方法,类似于<a href="https://stackoverflow.com/a/46418523/3293881">^{<cd1>}</a>,但没有任何附加/串联-</p>
<pre><code>a = np.array(series)
out = np.flatnonzero((a[2:] == a[1:-1]) & (a[1:-1] != a[:-2]))+1
</code></pre>
<p>样本运行-</p>
<pre><code>In [28]: a = np.array(series)
In [29]: np.flatnonzero((a[2:] == a[1:-1]) & (a[1:-1] != a[:-2]))+1
Out[29]: array([ 5, 11, 17, 21])
</code></pre>
<p><strong>运行时测试</strong>(用于工作解决方案)</p>
<p>接近-</p>
<pre><code>def piRSquared1(series):
d = np.flatnonzero(np.diff(series) == 0)
w = np.append(True, np.diff(d) > 1)
return d[w].tolist()
def piRSquared2(series):
s = np.array(series)
return np.flatnonzero(
np.append(s[:-1] == s[1:], True) &
np.append(True, s[1:] != s[:-1])
).tolist()
def Zach(series):
s = pd.Series(series)
i = [g.index[0] for _, g in s.groupby((s != s.shift()).cumsum()) if len(g) > 1]
return i
def jezrael(series):
s = pd.Series(series)
s1 = s.shift(1).ne(s).cumsum()
m = ~s1.duplicated() & s1.duplicated(keep=False)
s2 = m.index[m].tolist()
return s2
def divakar(series):
a = np.array(series)
x = a[1:-1]
return (np.flatnonzero((a[2:] == x) & (x != a[:-2]))+1).tolist()
</code></pre>
<p>对于设置,我们只是将示例输入平铺多次。你知道吗</p>
<p>计时-</p>
<p>案例1:大套</p>
<pre><code>In [34]: series0 = [2,3,7,10,11,16,16,9,11,12,14,16,16,16,5,7,9,17,17,4,8,18,18]
In [35]: series = np.tile(series0,10000).tolist()
In [36]: %timeit piRSquared1(series)
...: %timeit piRSquared2(series)
...: %timeit Zach(series)
...: %timeit jezrael(series)
...: %timeit divakar(series)
...:
100 loops, best of 3: 8.06 ms per loop
100 loops, best of 3: 7.79 ms per loop
1 loop, best of 3: 3.88 s per loop
10 loops, best of 3: 24.3 ms per loop
100 loops, best of 3: 7.97 ms per loop
</code></pre>
<p>案例2:更大的集合(在前两个解决方案上)</p>
<pre><code>In [40]: series = np.tile(series0,1000000).tolist()
In [41]: %timeit piRSquared2(series)
1 loop, best of 3: 823 ms per loop
In [42]: %timeit divakar(series)
1 loop, best of 3: 823 ms per loop
</code></pre>
<p>现在,这两种解决方案的区别仅仅在于后一种方法避免了附加。让我们仔细看看它们,在一个较小的数据集上运行-</p>
<pre><code>In [43]: series = np.tile(series0,100).tolist()
In [44]: %timeit piRSquared2(series)
10000 loops, best of 3: 89.4 µs per loop
In [45]: %timeit divakar(series)
10000 loops, best of 3: 82.8 µs per loop
</code></pre>
<p>因此,它揭示了后一种解决方案中的连接/附加避免在处理较小的数据集时有很大帮助,但是在更大的数据集上,它们变得具有可比性。你知道吗</p>
<p>在较大的数据集上进行一次连接就可以实现边际改进。因此,最后一步可以重写为:</p>
<pre><code>np.flatnonzero(np.concatenate(([False],(a[2:] == a[1:-1]) & (a[1:-1] != a[:-2]))))
</code></pre>