在Python中，将一列中的多个字符替换为NaN

id strings 0 1 south 1 2 north 2 3 east 3 4 west 4 5 west, east, south 5 6 west, west 6 7 north, north 7 8 north, south 8 9 West Corporation global office 9 10 West-Riding 10 11 University of West Florida 11 12 Southwest

id strings 0 1 NaN 1 2 NaN 2 3 NaN 3 4 NaN 4 5 NaN 5 6 NaN 6 7 NaN 7 8 NaN 8 9 West Corporation global office 9 10 West-Riding 10 11 University of West Florida 11 12 Southwest

df['strings'].astype(str).replace('south', np.nan).replace('north', np.nan)\ .replace('west', np.nan).replace('east', np.nan).replace('west, east', np.nan)\ .replace('west, west', np.nan).replace('north, north', np.nan).replace('west, east', np.nan)\ .replace('north, south', np.nan)

2条回答

网友

1楼 · 编辑于 2024-05-26 17:43:21

首先使用^{}，前向填充替换缺失值，测试所有匹配值是否由^{}和^{}匹配掩码，最后设置缺失值是否由^{}：

L = ['south','north','east','west']
m = df['strings'].str.split(', ', expand=True).ffill(axis=1).isin(L).all(axis=1)

df['strings'] = df['strings'].mask(m)
print (df)
    id                         strings
0    1                             NaN
1    2                             NaN
2    3                             NaN
3    4                             NaN
4    5                             NaN
5    6                             NaN
6    7                             NaN
7    8                             NaN
8    9  West Corporation global office
9   10                     West-Riding
10  11      University of West Florida
11  12                       Southwest

关于{}s、{}和{a5}的另一个想法：

m = [set(x.split(', ')).isdisjoint(L) for x in df['strings']]
df['strings'] = df['strings'].where(m)
print (df)
    id                         strings
0    1                             NaN
1    2                             NaN
2    3                             NaN
3    4                             NaN
4    5                             NaN
5    6                             NaN
6    7                             NaN
7    8                             NaN
8    9  West Corporation global office
9   10                     West-Riding
10  11      University of West Florida
11  12                       Southwest

网友

2楼 · 编辑于 2024-05-26 17:43:21

使用正则表达式

Ex:

df = pd.DataFrame({'strings': ['south', 'north', 'east', 'west', 'west, east, south', 'west, west', 'north, north', 'north, south', 'West Corporation global office', 'West-Riding', 'University of West Florida', 'Southwest']})
df['R'] = df['strings'].replace(r"\b(south|north|east|west)\b,?", np.NAN, regex=True)
print(df)

输出：

                           strings                               R
0                            south                             NaN
1                            north                             NaN
2                             east                             NaN
3                             west                             NaN
4                west, east, south                             NaN
5                       west, west                             NaN
6                     north, north                             NaN
7                     north, south                             NaN
8   West Corporation global office  West Corporation global office
9                      West-Riding                     West-Riding
10      University of West Florida      University of West Florida
11                       Southwest                       Southwest

相关问题更多 >

编程相关推荐

热门问题

热门文章