我有以下数据帧:
test = {'title': ['Undeclared milk in Burnbrae', 'Undeclared milk in certain Bumble', 'Certain cheese products may contain listeria', 'Ocean brand recalled due to Salmonella', 'IQF Raspberries due to Listeria']}
example = pd.DataFrame(test)
example
title
0 Undeclared milk in Burnbrae
1 Undeclared milk in certain Bumble
2 Certain cheese products may contain listeria
3 Ocean brand recalled due to Salmonella
4 IQF Raspberries due to Listeria
我想在同一列中提取以下字符串。我希望我的结果如下所示:
test = {'hazard': ['Undeclared milk', 'Undeclared milk', 'listeria', 'Salmonella', 'Listeria'], 'title': ['Undeclared milk in Burnbrae', 'Undeclared milk in certain Bumble', 'Certain cheese products may contain listeria', 'Ocean brand recalled due to Salmonella', 'IQF Raspberries due to Listeria']}
example2 = pd.DataFrame(test)
example2
hazard title
0 Undeclared milk Undeclared milk in Burnbrae
1 Undeclared milk Undeclared milk in certain Bumble
2 listeria Certain cheese products may contain listeria
3 Salmonella Ocean brand recalled due to Salmonella
4 Listeria IQF Raspberries due to Listeria
本质上,我的分隔符是in
、may contain
和due to
example['hazard'] = example['title'].str.extract(r'^(.*?) in\b')
example['hazard'] = example['title'].str.extract(r'\b may contain (.*)$')
example['hazard'] = example['title'].str.extract(r'\b due to (.*)$')
我编写了上面的代码来测试每个分隔符,但希望在同一列中提取所有分隔符
我该怎么做
我感谢所有的帮助
您可以将分隔符连接到列表中,并通过
"|".join
将它们连接起来,以将其转换为更大的模式。从那里,Series.str.extract
可以得到所有匹配项,我们重新调整形状以匹配原始大小获得相同结果的更为首要原则的方法:
相关问题 更多 >
编程相关推荐