如何创建根据组大小排序的多索引数据帧？

df = pd.DataFrame({ 'IDs': list('abcdefgh'), 'Val': [ 'foo', 'bar', 'foo', 'abc', 'bar', 'bar', 'foo', 'foo' ] }) IDs Val 0 a foo 1 b bar 2 c foo 3 d abc 4 e bar 5 f bar 6 g foo 7 h foo

df['groupsize'] = df.groupby('Val')['IDs'].transform('size') df = ( df.sort_values(['groupsize', 'Val', 'IDs'], ascending=[False, True, True]) .drop('groupsize', axis=1) .set_index(['Val', 'IDs']) ) df.to_excel('example.xlsx', merge_cells=True)

2条回答

网友

1楼 · 编辑于 2024-04-25 20:27:32

使用set_index和value_counts

df.set_index('Val').loc[df.Val.value_counts().index]

Out[44]:
    IDs
Val
foo   a
foo   c
foo   g
foo   h
bar   b
bar   e
bar   f
abc   d

如果您需要多索引，只需将set_index与append=True链相加即可

df.set_index('Val').loc[df.Val.value_counts().index].set_index('IDs', append=True)

网友

2楼 · 编辑于 2024-04-25 20:27:32

您可以使用np.argsort和iloc来避免冗长的sort_values

s = np.argsort(-df.groupby('Val')['IDs'].transform('size'))

df.iloc[s].set_index(['Val', 'IDs'])

Val IDs
foo a
    c
    g
    h
bar b
    e
    f
abc d

相关问题更多 >

编程相关推荐

热门问题

热门文章