import numpy as np
pred = [0.99, 0.23, 0.11, 0.64, 0.45, 0.55, 0.76, 0.72, 0.97]
users = ['User2', 'User3', 'User2', 'User3', 'User0', 'User1', 'User4',
'User4', 'User4']
# assign integer indices to each unique user name, and get the total
# number of occurrences for each name
unames, idx, counts = np.unique(users, return_inverse=True, return_counts=True)
# now sum the values of pred corresponding to each index value
sum_pred = np.bincount(idx, weights=pred)
# finally, divide by the number of occurrences for each user name
mean_pred = sum_pred / counts
print(unames)
# ['User0' 'User1' 'User2' 'User3' 'User4']
print(mean_pred)
# [ 0.45 0.55 0.55 0.435 0.81666667]
如果您想坚持使用numpy,最简单的方法是使用^{} 和{a2}:
一个紧凑的解决方案是使用numpy_indexed(否认:我是它的作者),它实现了一个类似于Jaime提出的矢量化解决方案;但是它具有更干净的界面和更多的测试:
“纯numpy”解决方案可能使用
np.unique
和np.bincount
的组合:如果安装了pandas,则
^{pr2}$DataFrame
s有{a2}:相关问题 更多 >
编程相关推荐