数据帧按分类列排序,但按特定类排序

2024-04-29 00:10:51 发布

您现在位置:Python中文网/ 问答频道 /正文

我想使用df_selected = df_targets.head(N)根据特定列的条目选择Pandas数据框中的顶部条目

每个条目都有一个target值(按重要性顺序):

Likely Supporter, GOTV, Persuasion, Persuasion+GOTV  

不幸的是,如果我这样做了

df_targets = df_targets.sort("target")

排序将按字母顺序(GOTVLikely Supporter,…)

我希望有一个像list_ordering这样的关键字,如:

my_list = ["Likely Supporter", "GOTV", "Persuasion", "Persuasion+GOTV"] 
df_targets = df_targets.sort("target", list_ordering=my_list)

为了解决这个问题,我创建了一个字典:

dict_targets = OrderedDict()
dict_targets["Likely Supporter"] = "0 Likely Supporter"
dict_targets["GOTV"] = "1 GOTV"
dict_targets["Persuasion"] = "2 Persuasion"
dict_targets["Persuasion+GOTV"] = "3 Persuasion+GOTV"

,但这似乎是一种非Python式的方法

建议将不胜感激


Tags: targetdf顺序my条目sortdictlist
3条回答

我认为您需要^{}和参数ordered=True,然后按^{}进行排序,这非常好:

检查^{}的文档:

Ordered Categoricals can be sorted according to the custom order of the categories and can have a min and max value.

import pandas as pd

df = pd.DataFrame({'a': ['GOTV', 'Persuasion', 'Likely Supporter', 
                         'GOTV', 'Persuasion', 'Persuasion+GOTV']})

df.a = pd.Categorical(df.a, 
                      categories=["Likely Supporter","GOTV","Persuasion","Persuasion+GOTV"],
                      ordered=True)

print (df)
                  a
0              GOTV
1        Persuasion
2  Likely Supporter
3              GOTV
4        Persuasion
5   Persuasion+GOTV

print (df.a)
0                GOTV
1          Persuasion
2    Likely Supporter
3                GOTV
4          Persuasion
5     Persuasion+GOTV
Name: a, dtype: category
Categories (4, object): [Likely Supporter < GOTV < Persuasion < Persuasion+GOTV]
df.sort_values('a', inplace=True)
print (df)
                  a
2  Likely Supporter
0              GOTV
3              GOTV
1        Persuasion
4        Persuasion
5   Persuasion+GOTV

我想这是最好的选择,以防你面临某些情况: 这是您的首选订单

my_order = ["Likely Supporter", "GOTV", "Persuasion", "Persuasion+GOTV"]

所以,只要做

df['Column_to_update'].cat.reorder_categories(my_order, inplace= True)

它是灵活的,不需要分配新的类别。但是您的列必须是dtype = 'category',否则它将无法工作

Read more here (Pandas documentation)

我前面的答案中显示的方法现在已被弃用

相反,最好使用pandas.Categorical,如图所示here

因此:

list_ordering = ["Likely Supporter","GOTV","Persuasion","Persuasion+GOTV"]  
df["target"] = pd.Categorical(df["target"], categories=list_ordering) 

相关问题 更多 >