Pandas。替换与…冲突结构更换正则表达式。代码Ord

2024-05-13 18:55:56 发布

您现在位置:Python中文网/ 问答频道 /正文

我的任务是删除括号中的任何内容,并删除后跟国家名称的任何数字。改变几个国家的名字。在

例如。 玻利维亚(多民族国)应为“玻利维亚” 瑞士17'应该是'瑞士'。在

我最初的代码是这样的:

dict1 = {
"Republic of Korea": "South Korea",
"United States of America": "United States",
"United Kingdom of Great Britain and Northern Ireland": "United Kingdom",
"China, Hong Kong Special Administrative Region": "Hong Kong"} 

energy['Country'] = energy['Country'].replace(dict1)
energy['Country'] = energy['Country'].str.replace(r' \(.*\)', '')
energy['Country'] = energy['Country'].str.replace('\d+', '')
energy.loc[energy['Country'] == 'United States']

str.replace部分工作正常。任务已完成。 当我用最后一行检查我是否成功地更改了国家名称时。这个原始代码不起作用。但是,如果我将代码的顺序改为:

energy['Country'] = energy['Country'].str.replace(r' \(.*\)', '') energy['Country'] = energy['Country'].str.replace('\d+', '') energy['Country'] = energy['Country'].replace(dict1)

然后它成功地改变了国家名称。 所以我的Regex语法一定有问题,怎么解决这个冲突呢?为什么会这样?在


Tags: of代码名称国家countryreplaceunitedkingdom
1条回答
网友
1楼 · 发布于 2024-05-13 18:55:56

问题是您需要regex=True^{}来替换substrings

energy = pd.DataFrame({'Country':['United States of America4',
                                  'United States of America (aaa)','Slovakia']})
print (energy)
                          Country
0       United States of America4
1  United States of America (aaa)
2                        Slovakia

dict1 = {
"Republic of Korea": "South Korea",
"United States of America": "United States",
"United Kingdom of Great Britain and Northern Ireland": "United Kingdom",
"China, Hong Kong Special Administrative Region": "Hong Kong"} 

^{pr2}$
energy['Country'] = energy['Country'].replace(dict1, regex=True)
print (energy)
               Country
0       United States4
1  United States (aaa)
2             Slovakia

energy['Country'] = energy['Country'].str.replace(r' \(.*\)', '')
energy['Country'] = energy['Country'].str.replace('\d+', '')
print (energy)
         Country
0  United States
1  United States
2       Slovakia

print (energy.loc[energy['Country'] == 'United States'])
         Country
0  United States
1  United States

#first data cleaning
energy['Country'] = energy['Country'].str.replace(r' \(.*\)', '')
energy['Country'] = energy['Country'].str.replace('\d+', '')
print (energy)
                    Country
0  United States of America
1  United States of America
2                  Slovakia

#replace works nice
energy['Country'] = energy['Country'].replace(dict1)
print (energy)
         Country
0  United States
1  United States
2       Slovakia

print (energy.loc[energy['Country'] == 'United States'])
         Country
0  United States
1  United States

相关问题 更多 >