如何根据优先级和特定值对数据帧列中的数据进行排序和提取?

2024-05-15 10:28:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个数据帧:

首先,df1

data = {'Type':['a', 'b', 'c', 'd', 'e'],
        'Rank':[1, 2, 3, 4, 5]}
df1 = pd.DataFrame(data)

框架如下所示:

   Type Rank
0   a   1
1   b   2
2   c   3
3   d   4
4   e   5

第二df2

variants = {'Variants':['K|a&b|MOD||,K|d|LOW|,K|a&e|MOD||',
'J|c&d&a|MOD|,J|b&c&d|MOD||',
'H|b&c|HIGH|,H|b|HIGH||',
'H|b&c|HIGH||',
'K|d|LOW||,K|a&e|MOD|||,K|a&b|MOD||',
'-|d|LOW|,K|a&e|MOD||,K|a&b|MOD|||',]}
df2 = pd.DataFrame(variants)

df2如下所示:

    Variants
0   K|a&b|MOD||,K|d|LOW|,K|a&e|MOD||
1   J|c&d&a|MOD|,J|b&c&d|MOD||
2   H|b&c|HIGH|,H|b|HIGH||
3   H|b&c|HIGH||
4   K|d|LOW||,K|a&e|MOD|||,K|a&b|MOD||
5   -|d|LOW|,K|a&e|MOD||,K|a&b|MOD|||

我试图通过拆分{}、{}和{}来只提取{}中排名靠前的{}。我只想从Variants中提取一个值,它被认为是df1['Type']的高秩

我使用以下代码生成输出:

d = df1.set_index('Type')['Rank'].to_dict()
df = (df2.assign(Extracted = df2['Variants'].str.split(','))
        .explode('Extracted')
        .assign(Ranked = lambda x: x['Extracted'].str.split('&|\|'))
        .explode('Ranked')
        .assign(Rank = lambda x: x['Ranked'].map(d))
        .sort_values('Rank')
        )
df = df[~df.index.duplicated()].sort_index()

结果是:

    Variants                            Extracted     Ranked    Rank
0   K|a&b|MOD||,K|d|LOW|,K|a&e|MOD||    K|a&b|MOD||     a       1.0
1   J|c&d&a|MOD|,J|b&c&d|MOD||          J|b&c&d|MOD||   a       1.0
2   H|b&c|HIGH|,H|b|HIGH||              H|b|HIGH||      b       2.0
3   H|b&c|HIGH||                        H|b&c|HIGH||    b       2.0
4   K|d|LOW||,K|a&e|MOD|||,K|a&b|MOD||  K|d|LOW||       a       1.0
5   -|d|LOW|,K|a&e|MOD||,K|a&b|MOD|||   K|a&b|MOD|||    a       1.0

但是,它为某些行生成了不正确的输出。这里,第5行的df['Extracted']应该有K|a&e|MOD|||K|a&b|MOD||,但它已经采取了K|d|LOW||

预期产出为

    Variants                            Extracted     Ranked    Rank
0   K|a&b|MOD||,K|d|LOW|,K|a&e|MOD||    K|a&b|MOD||     a       1.0
1   J|c&d&a|MOD|,J|b&c&d|MOD||          J|b&c&d|MOD||   a       1.0
2   H|b&c|HIGH|,H|b|HIGH||              H|b|HIGH||      b       2.0
3   H|b&c|HIGH||                        H|b&c|HIGH||    b       2.0
4   K|d|LOW||,K|a&e|MOD|||,K|a&b|MOD||  K|a&e|MOD|||    a       1.0
5   -|d|LOW|,K|a&e|MOD||,K|a&b|MOD|||   K|a&b|MOD|||    a       1.0

谢谢。谢谢你的帮助


Tags: moddfdataindextypelowpddf1