给定一个玩具数据集,如下所示:
id room area situation
0 1 A-102 world under construction
1 2 NaN 24 under construction
2 3 B309 NaN NaN
3 4 C·102 25 under decoration
4 5 E_1089 hello under decoration
5 6 27 NaN under plan
6 7 27 NaN NaN
我需要检查三列:room, area, situation
基于以下条件:
(1)如果room
名称不是数字,字母表,-
(NaN
也被认为是无效的),则返回incorrect room name
作为check
列
(2)如果area
不是number
或NaN
,则返回area is not numbers
并将其附加到现有的check
列
(3)如果situation
有under decoration
,则返回decoration is in the content
并将其附加到现有的check
列
请注意,我还有其他列要签入实际数据,我需要通过分隔符;
附加新的check
结果
我怎样才能得到这样的预期结果:
id room area situation check
0 1 A-102 world under construction area is not numbers
1 2 NaN 24 under construction incorrect room name
2 3 B309 NaN NaN NaN
3 4 C·102 25 under decoration incorrect room name; decoration is in the content
4 5 E_1089 hello under decoration incorrect room name; area is not numbers; decoration is in the content
5 6 27 NaN under plan NaN
6 7 27 NaN NaN NaN
到目前为止,我的代码是:
房间名称检查:
df['check'] = np.where(df.room.str.match('^[a-zA-Z\d\-]*$'), np.NaN, 'incorrect room name')
输出:
id room area situation check
0 1 A-102 world under construction nan
1 2 NaN 24 under construction nan
2 3 B309 NaN NaN nan
3 4 C·102 25 under decoration incorrect room name
4 5 E_1089 hello under decoration incorrect room name
5 6 27 NaN under plan nan
6 7 27 NaN NaN nan
区域检查:
df['check'] = df['check'].where(df.area.str.contains('^\d+$', na = True),
'area is not a numbers')
输出:
id room area situation check
0 1 A-102 world under construction area is not a numbers
1 2 NaN 24 under construction nan
2 3 B309 NaN NaN nan
3 4 C·102 25 under decoration incorrect room name
4 5 E_1089 hello under decoration area is not a numbers
5 6 27 NaN under plan nan
6 7 27 NaN NaN nan
情况检查:
df['check'] = df['check'].where(df.situation.str.contains('under decoration', na = True),
'decoration is in the content')
输出:
id room area situation check
0 1 A-102 world under construction decoration is in the content
1 2 NaN 24 under construction decoration is in the content
2 3 B309 NaN NaN nan
3 4 C·102 25 under decoration incorrect room name
4 5 E_1089 hello under decoration area is not a numbers
5 6 27 NaN under plan decoration is in the content
6 7 27 NaN NaN nan
谢谢
我稍微修改了您的条件,使结果更接近您的预期输出:
首先^{} 更改每个测试的输出,然后
zip
更改每个数组,如果没有缺少值,则为join应用自定义函数:您可以尝试以下方法:
相关问题 更多 >
编程相关推荐