numpy中唯一元素的分组索引

import numpy as np from collections import defaultdict a = np.array([1, 2, 6, 4, 2, 3, 2]) d=defaultdict(list) for i,e in enumerate(a): d[e].append(i) d defaultdict(<type 'list'>, {1: [0], 2: [1, 4, 6], 3: [5], 4: [3], 6: [2]})

3条回答

网友

1楼 · 编辑于 2024-06-16 10:28:43

这可以通过python pandas（python数据分析库）和DataFrame.groupby调用来解决。在

考虑以下几点

 a = np.array([1, 2, 6, 4, 2, 3, 2])

 import pandas as pd
 df = pd.DataFrame({'a':a})

 gg = df.groupby(by=df.a)
 gg.groups

输出

^{pr2}$

网友

2楼 · 编辑于 2024-06-16 10:28:43

这与被问到的here非常相似，所以下面是对我的答案的改编。矢量化的最简单方法是使用排序。下面的代码借鉴了即将发布的1.9版的np.unique实现，其中包含独特的项目计数功能，请参见here：

>>> a = np.array([1, 2, 6, 4, 2, 3, 2])
>>> sort_idx = np.argsort(a)
>>> a_sorted = a[idx]
>>> unq_first = np.concatenate(([True], a_sorted[1:] != a_sorted[:-1]))
>>> unq_items = a_sorted[unq_first]
>>> unq_count = np.diff(np.nonzero(unq_first)[0])

现在：

^{pr2}$

要获得每个值的位置索引，只需执行以下操作：

>>> unq_idx = np.split(sort_idx, np.cumsum(unq_count))
>>> unq_idx
[array([0], dtype=int64), array([1, 4, 6], dtype=int64), array([5], dtype=int64),
 array([3], dtype=int64), array([2], dtype=int64)]

现在可以构造字典压缩unq_items和{}。在

注意，unq_count不计算最后一个唯一项的出现次数，因为拆分索引数组不需要这样做。如果你想拥有你能做到的所有价值：

>>> unq_count = np.diff(np.concatenate(np.nonzero(unq_first) + ([a.size],)))
>>> unq_idx = np.split(sort_idx, np.cumsum(unq_count[:-1]))

网友

3楼 · 编辑于 2024-06-16 10:28:43

numpy_indexed包（免责声明：我是它的作者）实现了一个受Jaime启发的解决方案；但是通过测试、良好的界面和许多相关功能：

import numpy_indexed as npi
unique, idx_groups = npi.group_by(a, np.arange(len(a))

相关问题更多 >

编程相关推荐

热门问题

热门文章