Pandas的等级是如何计算的

In this example,the highest value is 7.why do we get rank 5.5 for number 7 and rank 1.5 for number 4 S1 = pd.Series([7,6,7,5,4,4]) S1.rank() Output: 0 5.5 1 4.0 2 5.5 3 3.0 4 1.5 5 1.5 dtype: float64

3条回答

网友

1楼 · 编辑于 2024-04-25 12:57:42

正如Joachim所评论的，rank函数接受一个参数method，默认值为'average'。也就是说，最终秩是相同值的所有秩的平均值

根据该文件，method的其他选项包括：

method : {'average', 'min', 'max', 'first', 'dense'}, default 'average' How to rank the group of records that have the same value (i.e. ties):
average: average rank of the group
min: lowest rank in the group
max: highest rank in the group
first: ranks assigned in order they appear in the array
dense: like 'min', but rank always increases by 1 between groups numeric_only : bool, optional

例如，让我们试试：method='dense'，然后S1.rank(method='dense')给出：

0    4.0
1    3.0
2    4.0
3    2.0
4    1.0
5    1.0
dtype: float64

这在某种程度上相当于factorize

更新：根据您的问题，让我们尝试编写一个行为类似于S1.rank()的函数：

def my_rank(s):
    # sort s by values
    s_sorted = s.sort_values(kind='mergesort')

    # this is the incremental ranks
    # equivalent to s.rank(method='first')
    ranks = pd.Series(np.arange(len(s_sorted))+1, index=s_sorted.index)

    # averaged ranks
    avg_ranks = ranks.groupby(s_sorted).transform('mean')

    return avg_ranks

网友

2楼 · 编辑于 2024-04-25 12:57:42

如果希望最大排名，则执行默认排名，如下所示

S1 = pd.Series([7,6,7,5,4,4])
S1.rank(method='max')

这里是熊猫支持的所有等级

方法：{'average'，'min'，'max'，'first'，'dense'}，默认值为'average'

S1['default_rank'] = S1.rank()
S1['max_rank'] = S1.rank(method='max')
S1['NA_bottom'] = S1.rank(na_option='bottom')
S1['pct_rank'] = S1.rank(pct=True)
print(S1)

网友

3楼 · 编辑于 2024-04-25 12:57:42

排名是这样计算的

按升序排列元素，并从最低元素的“1”开始分配列组

Elements - 4, 4, 5, 6, 7, 7
Ranks    - 1, 2, 3, 4, 5, 6

考虑重复项，平均出相应的秩，并将平均秩分配给它们。<李>

由于“4”重复两次，因此每次事件的最终排名将为1,2的平均值，即1.5。以相同的方式或7，每次事件的最终排名平均为5,6，即5.5

Elements -   4,   4,   5, 6, 7,   7
Ranks    -   1,   2,   3, 4, 5,   6
Final Rank - 1.5, 1.5, 3, 4, 5.5, 5.5

相关问题更多 >

编程相关推荐

热门问题

热门文章