pandas DataFrame 自定义排序函数
抱歉如果之前有人解释过这个问题,但我还是找不到合适的解决办法。 我有一段用 pandas.DataFrame 写的短代码:
import pandas as pd
table = {
"key4": ["key3", "command4"],
"key2": ["key1", "command2"],
"key3": ["cron3", "command3"],
"key5": ["cron5", "command5"],
"key1": ["cron1", "command1"]
}
columns = ["trigger", "command"]
df = pd.DataFrame.from_dict(table, orient='index', columns=columns)
我想把每一行按照它的父级来排序,也就是说,如果在触发列中找到了 key1,这一行应该排在名字为 key1 的行后面。(我只期望 key1 作为名字和触发值出现一次。) 或者这样做是不是太复杂了,我应该尝试用其他格式吗? 所以,打印出来的 df 应该看起来像这样:
trigger command
key3 cron3 command3
key4 key3 command4
key5 cron5 command5
key1 cron1 command1
key2 key1 command2
我能不能以某种方式把一个函数导入到 df.sort_values() 中,这样排序就可以更自定义一些?谢谢!
1 个回答
2
你可以使用掩码和 numpy.lexsort
来处理数据:
import numpy as np
m = df['trigger'].isin(df.index)
key = df['trigger'].where(m, df.index)
out = df.iloc[np.lexsort([m, key])]
或者使用纯 pandas(虽然我觉得不太优雅,但如果你需要自定义排序,这种方式更灵活;见下文):
out = (df
.assign(m=df['trigger'].isin(df.index),
key=lambda d: d['trigger'].where(m, d.index))
.sort_values(by=['key', 'm'])
.drop(columns=['m', 'key'])
)
输出结果:
trigger command
key1 cron1 command1
key2 key1 command2
key3 cron3 command3
key4 key3 command4
key5 cron5 command5
中间结果:
# before sorting
trigger command m key
key4 key3 command4 True key3
key2 key1 command2 True key1
key3 cron3 command3 False key3
key5 cron5 command5 False key5
key1 cron1 command1 False key1
# after sorting
trigger command m key
key1 cron1 command1 False key1
key2 key1 command2 True key1
key3 cron3 command3 False key3
key4 key3 command4 True key3
key5 cron5 command5 False key5
如果你想保持组的原始顺序(key3 -> key5 -> key1
):
out = (df
.assign(m=df['trigger'].isin(df.index),
key=lambda d: pd.Categorical(d['trigger'].where(m, d.index),
categories=d.index[~d['m']].unique(),
ordered=True)
)
.sort_values(by=['key', 'm'])
.drop(columns=['m', 'key'])
)
变体:
m = df['trigger'].isin(df.index)
key = df['trigger'].where(m, df.index)
order = df.index[~m].unique()
tmp = df.assign(key=pd.Categorical(key, categories=order, ordered=True))
out = (pd.concat([tmp[~m], tmp[m]]).sort_values(by='key', kind='stable')
.drop(columns='key')
)
输出结果:
trigger command
key3 cron3 command3
key4 key3 command4
key5 cron5 command5
key1 cron1 command1
key2 key1 command2