在存在重复值时转换numpy.array的元素顺序

2 投票

3 回答

539 浏览

提问于 2025-04-18 10:25

我在寻找一种高效的方法来完成以下任务：

如果我的输入是：

np.array([9,0,1,0,3,0])

那么我希望我的输出是：

np.array([0,3,2,3,1,3]) # 9 is the highest, so it gets rank 0
                        # 3 is the second highest, so it gets rank 1
                        # 1 is third highest, so it gets rank 2
                        # 0's are forth highest so they get rank 3

我想把这个应用到二维矩阵上：

输入：

a = np.array([[9,0,1,0,3,0],
              [0,1,2,3,4,5],
              [0.01,0.3,2,100,1,1],
              [0,0,0,0,1,1],
              [4,4,4,4,4,4]])

输出：

>>> get_order_array(a)
array([[0, 3, 2, 3, 1, 3],
       [5, 4, 3, 2, 1, 0],
       [4, 3, 1, 0, 2, 2],
       [1, 1, 1, 1, 0, 0],
       [0, 0, 0, 0, 0, 0]])

我可以用以下方法实现上面的目标；不过，我觉得这个方法效率很低，所以我希望有人能建议一个更好的方法来达到我的目的。

def get_order(x):
    unique_x = np.unique(x)
    step_1 = np.argsort(unique_x)[::-1]
    temp_dict = dict(zip(unique_x, step_1))
    return np.vectorize(temp_dict.get)(x)

def get_order_array(x):
    new_array = np.empty(x.shape, dtype=np.int)
    for i in xrange(x.shape[0]):
        new_array[i] = get_order(x[i])
    return new_array

数据处理 numpy 数组操作重复值处理矩阵转换

3 个回答

基本上：

order = a.argsort(axis=1)
ranks = order.argsort(axis=1)

而且，不，我并不是自己想出的这个聪明的答案。请看：

使用Python/NumPy对数组中的项目进行排名

在那里你也可以找到一个方法，如果你想让相同的数字有相同的排名的话。（这个方法会给重复的数字连续的排名。）

回答于 2025-04-18 由 Python大师

分享举报

一点点 cumsum 的小技巧可以带来很大的帮助：

a_idx = np.argsort(a, axis=-1)[:, ::-1]
a_sorted = a[np.arange(a.shape[0])[:, None], a_idx]
a_diff = np.zeros_like(a_sorted, dtype=np.bool)
a_diff[:, 1:] = a_sorted[:, :-1] != a_sorted[:, 1:]
a_sorted_ranks = np.cumsum(a_diff, axis=1)
a_ranks = a_sorted_ranks[np.arange(a_sorted_ranks.shape[0])[:, None],
                         np.argsort(a_idx, axis=1)]
>>> a_ranks
array([[0, 3, 2, 3, 1, 3],
       [5, 4, 3, 2, 1, 0],
       [4, 3, 1, 0, 2, 2],
       [1, 1, 1, 1, 0, 0],
       [0, 0, 0, 0, 0, 0]])

回答于 2025-04-18 由 Python大师

分享举报

@Jaime的回答很棒（像往常一样！）。这里有一个替代方案，使用了scipy.stats.rankdata。

在rankdata的术语中，你需要的是“紧凑”的排名。你还想要以与平常相反的顺序来排名。为了实现这个反向排序，我们会把-a传给rankdata。另外，我们还会从排名中减去1，这样排名就从0开始，而不是从1开始。最后，你想要对一个二维数组的行进行排名。rankdata只能处理一维数据，所以我们需要对每一行进行循环。

下面是代码：

import numpy as np
from scipy.stats import rankdata


def get_order_array(a):
    b = np.empty(a.shape, dtype=int)
    for k, row in enumerate(a):
        b[k] = rankdata(-row, method='dense') - 1
    return b


if __name__ == "__main__":    
    a = np.array([[9,0,1,0,3,0],
                  [0,1,2,3,4,5],
                  [0.01,0.3,2,100,1,1],
                  [0,0,0,0,1,1],
                  [4,4,4,4,4,4]])
    print get_order_array(a)

输出：

[[0 3 2 3 1 3]
 [5 4 3 2 1 0]
 [4 3 1 0 2 2]
 [1 1 1 1 0 0]
 [0 0 0 0 0 0]]

回答于 2025-04-18 由 Python大师

分享举报

在存在重复值时转换numpy.array的元素顺序

3 个回答

撰写回答