Pandas：按出现的顺序排序

df = pd.DataFrame(pd.np.zeros((15,10,)), dtype=int, \ index=[['a']*5+['b']*5+['c']*5, list(range(15))]) df.index.names=['index0', 'index1'] pd.np.random.seed(0) for i, v in df.iterrows(): v.loc[pd.np.random.randint(10)] = 1 df 0 1 2 3 4 5 6 7 8 9 index0 index1 a 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 2 0 0 0 1 0 0 0 0 0 0 3 0 0 0 1 0 0 0 0 0 0 4 0 0 0 0 0 0 0 1 0 0 b 5 0 0 0 0 0 0 0 0 0 1 6 0 0 0 1 0 0 0 0 0 0 7 0 0 0 0 0 1 0 0 0 0 8 0 0 1 0 0 0 0 0 0 0 9 0 0 0 0 1 0 0 0 0 0 c 10 0 0 0 0 0 0 0 1 0 0 11 0 0 0 0 0 0 1 0 0 0 12 0 0 0 0 0 0 0 0 1 0 13 0 0 0 0 0 0 0 0 1 0 14 0 1 0 0 0 0 0 0 0 0

0 1 2 3 4 5 6 7 8 9 index0 index1 a 1 1 0 0 0 0 0 0 0 0 0 2 0 0 0 1 0 0 0 0 0 0 3 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 4 0 0 0 0 0 0 0 1 0 0 c 14 0 1 0 0 0 0 0 0 0 0 11 0 0 0 0 0 0 1 0 0 0 10 0 0 0 0 0 0 0 1 0 0 12 0 0 0 0 0 0 0 0 1 0 13 0 0 0 0 0 0 0 0 1 0 b 8 0 0 1 0 0 0 0 0 0 0 6 0 0 0 1 0 0 0 0 0 0 9 0 0 0 0 1 0 0 0 0 0 7 0 0 0 0 0 1 0 0 0 0 5 0 0 0 0 0 0 0 0 0 1

1条回答

网友

1楼 · 发布于 2024-04-26 23:42:23

一种方法是将pandas.DataFrame.groupby与idxmax和sort_values一起使用：

import pandas as pd

l = (d.loc[d.idxmax(1).sort_values().index] for _, d in df.groupby('index0'))
new_df = pd.concat(sorted(l, key= lambda x:list(x.sum()), reverse=True))
print(new_df)

输出：

               0  1  2  3  4  5  6  7  8  9
index0 index1                              
a      1       1  0  0  0  0  0  0  0  0  0
       2       0  0  0  1  0  0  0  0  0  0
       3       0  0  0  1  0  0  0  0  0  0
       0       0  0  0  0  0  1  0  0  0  0
       4       0  0  0  0  0  0  0  1  0  0
c      14      0  1  0  0  0  0  0  0  0  0
       11      0  0  0  0  0  0  1  0  0  0
       10      0  0  0  0  0  0  0  1  0  0
       12      0  0  0  0  0  0  0  0  1  0
       13      0  0  0  0  0  0  0  0  1  0
b      8       0  0  1  0  0  0  0  0  0  0
       6       0  0  0  1  0  0  0  0  0  0
       9       0  0  0  0  1  0  0  0  0  0
       7       0  0  0  0  0  1  0  0  0  0
       5       0  0  0  0  0  0  0  0  0  1

如果1是文本，其余部分相同，请尝试使用pandas.Dataframe.ne

tmp = df.ne(0)
# same operation
new_df = df.loc[new_tmp.index]

相关问题更多 >

编程相关推荐

热门问题

热门文章