我有两个数据帧:
首先,df1
:
data = {'Type':['a', 'b', 'c', 'd', 'e'],
'Rank':[1, 2, 3, 4, 5]}
df1 = pd.DataFrame(data)
框架如下所示:
Type Rank
0 a 1
1 b 2
2 c 3
3 d 4
4 e 5
第二df2
:
variants = {'Variants':['K|a&b|MOD||,K|d|LOW|,K|a&e|MOD||',
'J|c&d&a|MOD|,J|b&c&d|MOD||',
'H|b&c|HIGH|,H|b|HIGH||',
'H|b&c|HIGH||',
'K|d|LOW||,K|a&e|MOD|||,K|a&b|MOD||',
'-|d|LOW|,K|a&e|MOD||,K|a&b|MOD|||',]}
df2 = pd.DataFrame(variants)
df2
如下所示:
Variants
0 K|a&b|MOD||,K|d|LOW|,K|a&e|MOD||
1 J|c&d&a|MOD|,J|b&c&d|MOD||
2 H|b&c|HIGH|,H|b|HIGH||
3 H|b&c|HIGH||
4 K|d|LOW||,K|a&e|MOD|||,K|a&b|MOD||
5 -|d|LOW|,K|a&e|MOD||,K|a&b|MOD|||
我试图通过拆分{Variants
中提取一个值,它被认为是df1['Type']
的高秩
我使用以下代码生成输出:
d = df1.set_index('Type')['Rank'].to_dict()
df = (df2.assign(Extracted = df2['Variants'].str.split(','))
.explode('Extracted')
.assign(Ranked = lambda x: x['Extracted'].str.split('&|\|'))
.explode('Ranked')
.assign(Rank = lambda x: x['Ranked'].map(d))
.sort_values('Rank')
)
df = df[~df.index.duplicated()].sort_index()
结果是:
Variants Extracted Ranked Rank
0 K|a&b|MOD||,K|d|LOW|,K|a&e|MOD|| K|a&b|MOD|| a 1.0
1 J|c&d&a|MOD|,J|b&c&d|MOD|| J|b&c&d|MOD|| a 1.0
2 H|b&c|HIGH|,H|b|HIGH|| H|b|HIGH|| b 2.0
3 H|b&c|HIGH|| H|b&c|HIGH|| b 2.0
4 K|d|LOW||,K|a&e|MOD|||,K|a&b|MOD|| K|d|LOW|| a 1.0
5 -|d|LOW|,K|a&e|MOD||,K|a&b|MOD||| K|a&b|MOD||| a 1.0
但是,它为某些行生成了不正确的输出。这里,第5行的df['Extracted']
应该有K|a&e|MOD|||
或K|a&b|MOD||
,但它已经采取了K|d|LOW||
预期产出为
Variants Extracted Ranked Rank
0 K|a&b|MOD||,K|d|LOW|,K|a&e|MOD|| K|a&b|MOD|| a 1.0
1 J|c&d&a|MOD|,J|b&c&d|MOD|| J|b&c&d|MOD|| a 1.0
2 H|b&c|HIGH|,H|b|HIGH|| H|b|HIGH|| b 2.0
3 H|b&c|HIGH|| H|b&c|HIGH|| b 2.0
4 K|d|LOW||,K|a&e|MOD|||,K|a&b|MOD|| K|a&e|MOD||| a 1.0
5 -|d|LOW|,K|a&e|MOD||,K|a&b|MOD||| K|a&b|MOD||| a 1.0
谢谢。谢谢你的帮助
目前没有回答
相关问题 更多 >
编程相关推荐