获取最常出现的字符串值?

2024-03-28 23:13:32 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个NBA球员比赛数据集,我想知道是否有一种方法来获得模式统计等价物,通常用于连续值,但得到最频繁出现的字符串值?你知道吗

    t1_start1           t1_start2          t1_start3       t1_start4    t1_start5   team1
0   Shaquille O'Neal    Kobe Bryant         Horace Grant    Ron Harper  Rick Fox    LAL
1   Shaquille O'Neal    Kobe Bryant         Horace Grant    Ron Harper  Rick Fox    LAL
2   Kobe Bryant         Shaquille O'Neal    Horace Grant    Ron Harper  Brian Shaw  LAL
3   Kobe Bryant         Shaquille O'Neal    Horace Grant    Brian Shaw  Ron Harper  LAL
4   Kobe Bryant         Shaquille O'Neal    Horace Grant    Ron Harper  Brian Shaw  LAL
5   LeBron James        Brandon Ingram      Kyle Kuzma      JaVale McGeeLonzo  Ball  LAL

不管球员首发'命令(t1_start1 | t1_start2 | t1_start3 | ... ),我如何得到5个最常用的球员在过去3行按列'team1'分组?你知道吗


Tags: t1球员briangrantlalstart1ronshaw
2条回答
flat_list = df.loc[[0:3]].values.flatten() # first 3 rows flattened to a 1d list
print(scipy.stats.mode(flat_list).mode) # the most common element in that list

如果你想要的不仅仅是一个值,你可以使用收款台你知道吗

most_common_5 = collection.Counter(flat_list).most_common(5)

可以将np.unique()return_counts=Truenp.argsort()一起使用:

players, starts = np.unique(df[['t1_start1','t1_start2','t1_start3','t1_start4','t1_start5']].values, return_counts=True)

players[np.argsort(-starts)][:5]

退货:

['Horace Grant' 'Kobe Bryant' 'Ron Harper' "Shaquille O'Neal" 'Brian Shaw']

相关问题 更多 >