基于其他数据帧中的列值替换数据帧中的值

2024-06-16 09:31:30 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据帧:

section            name     overall   admission        room              
0        Supriya Bachal  4432837753  4431710642  4431711344
1          Meena Kumari  4432837752  4431710642  4431711344
2          Sunita Banik  4432837752  4431710643  4431711346
3          Madhuri Bhat  4432837753  4431710643  4431711347
4         Arushi Sharda  4432837753  4431710643  4431711347
5          Vishwas Kini  4432837753  4431710643  4431711347
6          Nishit goyal  4432837752  4431710642  4431711346
7         Shibiraj Soni  4432837753         NaN  4431711347  

和其他数据帧:

   rating     overall   admission        room
0       1  4432837749  4431710639  4431711343
1       2  4432837750  4431710640  4431711344
2       3  4432837751  4431710641  4431711345
3       4  4432837752  4431710642  4431711346
4       5  4432837753  4431710643  4431711347  

它显示了不同部分(总体、入院和病房)到评级(1到5)的映射

现在我想用他们的ID来代替评级

最终数据帧:

section            name  overall  admission  room              
0        Supriya Bachal        5          4     2
1          Meena Kumari        4          4     2
2          Sunita Banik        4          5     4
3          Madhuri Bhat        5          5     5
4         Arushi Sharda        5          5     5
5          Vishwas Kini        5          5     5
6          Nishit goyal        4          4     4
7         Shibiraj Soni        5        NaN     5   

我们有10个这样的专栏,对每个专栏做if-else是不可行的

有什么方法可以轻松做到这一点吗

短暂性脑缺血发作


Tags: 数据namesectionroomoveralladmissionbhatarushi
2条回答

可以使用设置索引值映射这些值

df3 = df[['section','name']]
for col in ['overall','admission', 'room']:
    df3[col] = df[col].map(df1.set_index(col)['rating'])

输出:

name    overall admission   room
0   Supriya Bachal  5   4.0 2
1   Meena Kumari    4   4.0 2
2   Sunita Banik    4   5.0 4
3   Madhuri Bhat    5   5.0 5
4   Arushi Sharda   5   5.0 5
5   Vishwas Kini    5   5.0 5
6   Nishit goyal    4   4.0 4
7   Shibiraj Soni   5   NaN 5

编辑1

#Time taken by solutions

df3 = df[['section','name']]
for col in ['overall','admission', 'room']:
    df3[col] = df[col].map(df1.set_index(col)['rating'])
2.42 ms ± 70.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

#Shubham solution
%%timeit
df.replace(df1.melt('rating').pivot('value', 'variable', 'rating'))
4.82 ms ± 114 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

DataFrame.replace

df1.replace(df2.melt('rating').pivot('value', 'variable', 'rating'))

   section            name  overall  admission  room
0        0  Supriya Bachal      5.0        4.0   2.0
1        1    Meena Kumari      4.0        4.0   2.0
2        2    Sunita Banik      4.0        5.0   4.0
3        3    Madhuri Bhat      5.0        5.0   5.0
4        4   Arushi Sharda      5.0        5.0   5.0
5        5    Vishwas Kini      5.0        5.0   5.0
6        6    Nishit goyal      4.0        4.0   4.0
7        7   Shibiraj Soni      5.0        NaN   5.0

相关问题 更多 >