在没有映射的情况下替换数据帧中多个值的优雅方法?

2024-04-26 09:50:33 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个如下所示的数据帧

import pandas as pd
df1 = pd.DataFrame({'ethnicity': ['AMERICAN INDIAN/ALASKA NATIVE', 'WHITE - BRAZILIAN', 'WHITE-RUSSIAN','HISPANIC/LATINO - COLOMBIAN',
                                 'HISPANIC/LATINO - MEXICAN','ASIAN','ASIAN - INDIAN','ASIAN - KOREAN','PORTUGUESE','MIDDLE-EASTERN','UNKNOWN',
                                 'USER DECLINED','OTHERS']})

enter image description here

我想替换“种族”列的值。例如:如果值是ASIAN - INDIAN,我只想将其替换为ASIAN

类似地,我想对包含AMERICANWHITEHISPANIC的字符串进行替换,其他字符串替换为others。这就是我想要的

df1.loc[df.ethnicity.str.contains('WHITE'),'ethnicity'] = "WHITE"
df1.loc[df.ethnicity.str.contains('ASIAN'),'ethnicity'] = "ASIAN"
df1.loc[df.ethnicity.str.contains('HISPANIC'),'ethnicity'] = "HISPANIC"
df1.loc[df.ethnicity.str.contains('AMERICAN'),'ethnicity'] = "AMERICAN"
df1.loc[df.ethnicity.str.contains(other ethnicities),ethnicity] = "Others" # please note here I don't know how to replace all other ethnicities at once as others

我希望我的输出如下所示

enter image description here


Tags: 字符串dfaslocpdindiandf1american
1条回答
网友
1楼 · 发布于 2024-04-26 09:50:33

按列表的值使用^{},for match返回NaN,因此添加^{}

L = ['WHITE','ASIAN','HISPANIC','AMERICAN']

print (f'({"|".join(L)})')
(WHITE|ASIAN|HISPANIC|AMERICAN)

df1.ethnicity = df1.ethnicity.str.extract(f'({"|".join(L)})', expand=False).fillna('Others')

或者你可以加入我们的行列:

df1.ethnicity = (df1.ethnicity.str.extract('(WHITE|ASIAN|AMERICAN|HISPANIC)', expand=False)
                    .fillna('Others'))

print (df1)
   ethnicity
0   AMERICAN
1      WHITE
2      WHITE
3   HISPANIC
4   HISPANIC
5      ASIAN
6      ASIAN
7      ASIAN
8     Others
9     Others
10    Others
11    Others
12    Others

相关问题 更多 >