获取前n个唯一值的索引

网友

1楼 · 编辑于 2024-05-15 09:35:05

检查以下内容我正在使用lexsort通过两个数组获取排序顺序，然后使用diff和flatnonzero找到需要添加split点的组

ind = np.lexsort((a, b))

v=np.column_stack([a,b])

sid=np.flatnonzero(np.any(np.diff(v[ind,:].T)>0,0))+1

yourlist=np.split(np.arange(len(a))[ind], sid)

n=1
np.concatenate([x[:n]for x in yourlist])
Out[347]: array([ 0,  3,  4,  7,  8,  2, 10,  5])

网友

2楼 · 编辑于 2024-05-15 09:35:05

方法#1:非常直接地使用pandas，如果您可以使用它-

In [41]: import pandas as pd

In [42]: df = pd.DataFrame({'a':a,'b':b})

In [43]: [np.flatnonzero(df.groupby(['a','b']).cumcount()<n) for n in [1,2]]
Out[43]: 
[array([ 0,  2,  3,  4,  5,  7,  8, 10]),
 array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])]

方法#2:对于带有ints的输入数组，在注重性能的情况下，我们可以使用一个更为NumPy的版本，如-

# https://stackoverflow.com/a/43211485/ @Divakar
def array_cumcount(a):
    idx = np.flatnonzero(a[1:] != a[:-1])+1
    shift_arr = np.ones(a.size,dtype=int)
    shift_arr[0] = 0

    if len(idx)>=1:
        shift_arr[idx[0]] = -idx[0]+1
        shift_arr[idx[1:]] = -idx[1:] + idx[:-1] + 1
    return shift_arr.cumsum()

ab = a*(b.max()+1) + b
sidx = ab.argsort()
ab_s = ab[sidx]
cumcounts = array_cumcount(ab_s)[sidx]
out = [np.flatnonzero(cumcounts<n) for n in [1,2]]

网友

3楼 · 编辑于 2024-05-15 09:35:05

这不是100%的NumPy解决方案。最后一步使用列表理解。我不确定100%NumPy解决方案是否可行。然而：

将阵列合并为二维阵列：

ab2d = np.stack([a, b]).T

查找唯一值：

uniq = np.unique(ab2d, axis=0)

对于每个唯一值，请在2d数组中找到其N最小索引：

N = 2
np.concatenate([np.argwhere((pair == ab2d).all(axis=1))[:N, 0]
                for pair in uniq])
#array([ 0,  1,  3,  2,  4,  6,  5,  7,  8,  9, 10, 11])

相关问题更多 >

编程相关推荐

热门问题

热门文章

获取前n个唯一值的索引

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >