使用基于正则表达式Python的另一列值替换列值

2024-05-16 12:00:54 发布

男 | 程序猿一只，喜欢编程写python代码。

这是我的数据帧的摘录

data = [
    ['Citroën Amillis', '20 Za Des Baliveaux - 77120 Amillis', '77120', 'ok'],
    ['Relat Paris 9e', 'Métro Opéra - 75009 Paris 9e', 'Paris', 'error'],
    ['Macif Avon', '49 Av Franklin Roosevelt - 77210 Avon', '77210', 'ok'],
    ['Atac La Chapelle-la-Reine', 'Za Rue De L\'avenir - 77760 La Chapelle-la-Reine', 'La', 'error'],
    ['Société Générale La Ferté-Gaucher', '42 Rue De Paris - 77320 La Ferté-Gaucher', 'La', 'error']
]

df = pd.DataFrame(data, columns=['nom_magasin', 'adresse', 'code_postal', 'is_code_postal'])

df

如您所见，我的数据框中存在错误。对于某些地址，特别是当城市名称是由“La Chapelle La Reine”组成时，“邮政编码”列是错误的

我想做的是：如果列“is_code_postal”是一个“错误”，用列“ADRESE”中出现的邮政编码的正则表达式替换“code_postal”

我找不到解决办法。为此，我尝试了这个df['is_code_postal'] = np.where(df.code_postal.str.match('^[a-zA-z]'), 'error', 'ok')。起初，我考虑在同一个函数中进行所有更改。但是我错过了一些东西

重要的是我的数据帧有点重（超过25万行），所以我想寻求一个有效的解决方案

你们知道吗

Tags：数据 df data is 错误 code ok error

1条回答

网友

1楼 · 发布于 2024-05-16 12:00:54

您可以忽略邮政编码，直接从“ADRESE”中提取，使用来自Quang的代码：

df['code_postal']=df['adresse'].str.extract('(\d{5})')

使用基于正则表达式Python的另一列值替换列值

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用基于正则表达式Python的另一列值替换列值

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >