排序数组子集中唯一值的计数

import numpy as np users = np.array([111, 222, 333]) info = np.zeros(len(users)) dt = [('id', np.int32), ('group', np.int16), ('other', np.float)] dat = np.array([(111, 1, 0.0), (111, 3, 0.0), (111, 2, 0.0), (111, 1, 0.0), (222, 1, 0.0), (222, 1, 0.0), (222, 4, 0.0), (333, 2, 0.0), (333, 1, 0.0), (333, 2, 0.0)], dtype=dt) for i, u in enumerate(users): u_dat = dat[np.in1d(dat['id'], u)] uniq = set(u_dat['group']) info[i] = int(len(uniq)) print info

1条回答

网友

1楼 · 发布于 2024-04-25 21:02:02

如果您想从numpy的矢量化中获益，那么如果您可以在手之前从dat中删除所有重复项，这将非常有帮助。然后，您可以通过对searchsorted的两次调用找到值的第一次和最后一次出现：

dat_unq = np.unique(dat)
first = dat_unq['id'].searchsorted(users, side='left')
last =  dat_unq['id'].searchsorted(users, side='right')
info = last - first

只有在dat中搜索大量条目时，这才是有利的。如果它是一个较小的分数，您仍然可以使用对searchsorted的两个调用来确定要调用unique的片：

info = np.empty_like(users, dtype=np.intp)
first = dat['id'].searchsorted(users, side='left')
last =  dat['id'].searchsorted(users, side='right')
for idx, (start, stop) in enumerate(zip(first, last)):
    info[idx] = len(np.unique(dat[start:stop]))

相关问题更多 >

编程相关推荐

热门问题

热门文章