我有一本Python字典,如下所示:
ref_dict = {
"Company1" :["C1_Dev1","C1_Dev2","C1_Dev3","C1_Dev4","C1_Dev5",],
"Company2" :["C2_Dev1","C2_Dev2","C2_Dev3","C2_Dev4","C2_Dev5",],
"Company3" :["C3_Dev1","C3_Dev2","C3_Dev3","C3_Dev4","C3_Dev5",],
}
我有一个名为df的熊猫数据框,其中一列如下所示:
DESC_DETAIL
0 Probably task Company2 C2_Dev5
1 File system C3_Dev1
2 Weather subcutaneous Company2
3 Company1 Travesty C1_Dev3
4 Does not match anything
...........
我的目标是在此数据框中添加两个额外的列,并将这些列命名为COMPANY和DEVICE。COMPANY列每行中的值将是字典中的公司键(如果它存在于DESC\u DETAIL列中),或者如果相应的设备存在于DESC\u DETAIL列中)。设备列中的值将只是描述细节列中的设备字符串。如果未找到匹配项,则对应的行为空。因此,最终输出将如下所示:
DESC_DETAIL COMPANY DEVICE
0 Probably task Company2 C2_Dev5 Company2 C2_Dev5
1 File system C3_Dev1 Company3 C3_Dev1
2 Weather subcutaneous Company2 Company2 NaN
3 Company1 Travesty C1_Dev3 Company1 C1_Dev3
4 Does not match anything NaN NaN
我的尝试:
for key, value in ref_dict.items():
df['COMPANY'] = df.apply(lambda row: key if row['DESC_DETAIL'].isin(key) else Nan, axis=1)
这显然是错误的,不起作用。我如何让它工作
可以使用正则表达式模式
str.extract
提取值:输出:
您还需要一个设备到公司字典,您可以从
ref_dict
轻松地构建它,如下所示:这样做很容易:
输出:
相关问题 更多 >
编程相关推荐