In [3815]: s = pd.Series(series)
In [3816]: cond = (s == s.shift(-1))
In [3817]: cond.index[cond]
Out[3817]: Int64Index([5, 11, 12, 17, 21], dtype='int64')
或者,diff
In [3828]: cond = s.diff(-1).eq(0)
In [3829]: cond.index[cond]
Out[3829]: Int64Index([5, 11, 12, 17, 21], dtype='int64')
对于列表输出,使用tolist
In [3833]: cond.index[cond].tolist()
Out[3833]: [5, 11, 12, 17, 21]
a = np.array(series)
out = np.flatnonzero((a[2:] == a[1:-1]) & (a[1:-1] != a[:-2]))+1
样本运行-
In [28]: a = np.array(series)
In [29]: np.flatnonzero((a[2:] == a[1:-1]) & (a[1:-1] != a[:-2]))+1
Out[29]: array([ 5, 11, 17, 21])
运行时测试(用于工作解决方案)
接近-
def piRSquared1(series):
d = np.flatnonzero(np.diff(series) == 0)
w = np.append(True, np.diff(d) > 1)
return d[w].tolist()
def piRSquared2(series):
s = np.array(series)
return np.flatnonzero(
np.append(s[:-1] == s[1:], True) &
np.append(True, s[1:] != s[:-1])
).tolist()
def Zach(series):
s = pd.Series(series)
i = [g.index[0] for _, g in s.groupby((s != s.shift()).cumsum()) if len(g) > 1]
return i
def jezrael(series):
s = pd.Series(series)
s1 = s.shift(1).ne(s).cumsum()
m = ~s1.duplicated() & s1.duplicated(keep=False)
s2 = m.index[m].tolist()
return s2
def divakar(series):
a = np.array(series)
x = a[1:-1]
return (np.flatnonzero((a[2:] == x) & (x != a[:-2]))+1).tolist()
对于设置,我们只是将示例输入平铺多次。你知道吗
计时-
案例1:大套
In [34]: series0 = [2,3,7,10,11,16,16,9,11,12,14,16,16,16,5,7,9,17,17,4,8,18,18]
In [35]: series = np.tile(series0,10000).tolist()
In [36]: %timeit piRSquared1(series)
...: %timeit piRSquared2(series)
...: %timeit Zach(series)
...: %timeit jezrael(series)
...: %timeit divakar(series)
...:
100 loops, best of 3: 8.06 ms per loop
100 loops, best of 3: 7.79 ms per loop
1 loop, best of 3: 3.88 s per loop
10 loops, best of 3: 24.3 ms per loop
100 loops, best of 3: 7.97 ms per loop
案例2:更大的集合(在前两个解决方案上)
In [40]: series = np.tile(series0,1000000).tolist()
In [41]: %timeit piRSquared2(series)
1 loop, best of 3: 823 ms per loop
In [42]: %timeit divakar(series)
1 loop, best of 3: 823 ms per loop
In [43]: series = np.tile(series0,100).tolist()
In [44]: %timeit piRSquared2(series)
10000 loops, best of 3: 89.4 µs per loop
In [45]: %timeit divakar(series)
10000 loops, best of 3: 82.8 µs per loop
首先通过} 进行筛选:
shift
和cumsum
创建唯一组,然后获取第一个重复项的掩码并通过^{你可以用
shift
或者,
diff
对于列表输出,使用
tolist
详细信息
下面是一个使用数组切片来提高性能的方法,类似于^{} ,但没有任何附加/串联-
样本运行-
运行时测试(用于工作解决方案)
接近-
对于设置,我们只是将示例输入平铺多次。你知道吗
计时-
案例1:大套
案例2:更大的集合(在前两个解决方案上)
现在,这两种解决方案的区别仅仅在于后一种方法避免了附加。让我们仔细看看它们,在一个较小的数据集上运行-
因此,它揭示了后一种解决方案中的连接/附加避免在处理较小的数据集时有很大帮助,但是在更大的数据集上,它们变得具有可比性。你知道吗
在较大的数据集上进行一次连接就可以实现边际改进。因此,最后一步可以重写为:
相关问题 更多 >
编程相关推荐