正则表达式仅匹配子目标题

import numpy as np Fairytales_in = {'Titles': ['Fairy Tales', 'Tales.3.2.Dancing Shoes, ballgowns and frogs', 'Tales.2.4.6.Red Riding Hood', 'Fairies.1Your own Fairy godmother', 'Ogres-1.1.The wondrous world of Shrek', 'Witches-1-4Maleficient and the malicious curse', 'Tales.2.1.The big bad wolf', 'Tales.2.Little Red riding Hood', 'Tales.2.4.6.1.Why the huntsman is underrated', 'Tales.5.f.Cinderella and the pumpkin carriage', 'Ogres-1.Best Ogre in town', 'No.3.Great Expectations']} Fairytales_in = pd.DataFrame.from_dict(Fairytales_in)

This would be my expected output: Fairytales_expected_output = {'Titles': ['Fairy Tales', 'Tales.3.2.Dancing Shoes, ballgowns and frogs', 'Tales.2.4.6.Red Riding Hood', 'Fairies.1Your own Fairy godmother', 'Ogres-1.1.The wondrous world of Shrek', 'Witches-1-4Maleficient and the malicious curse', 'Tales.2.1.The big bad wolf', 'Tales.2.Little Red riding Hood', 'Tales.2.4.6.1.Why the huntsman is underrated', 'Tales.5.f.Cinderella and the pumpkin carriage', 'Ogres-1.Best Ogre in town', 'No.3.Great Expectations'], 'Subheading': ['NaN', 'Tales.3.2.Dancing Shoes, ballgowns and frogs', 'NaN', 'NaN', 'Ogres-1.1.The wondrous world of Shrek', 'Witches-1-4Maleficient and the malicious curse', 'Tales.2.1.The big bad wolf', 'NaN', 'NaN', 'Tales.5.f.Cinderella and the pumpkin carriage', 'NaN', 'NaN']} Fairytales_expected_output = pd.DataFrame.from_dict(Fairytales_expected_output)

1条回答

网友

1楼 · 发布于 2024-05-29 06:45:27

你可以用

rx = r'^(\w+(?:[.-](?:\d+|[a-zA-Z]\b)){2}(?![.-]?\d).*)'
Fairytales_in['Subheading'] = Fairytales_in['Titles'].str.extract(rx, expand=False)

见regex demo

详细信息

^-字符串的开头
\w+-1个或多个单词字符
(?:[.-](?:\d+|[a-zA-Z]\b)){2}-两次出现
- [.-]-一个点或-
- (?:\d+|[a-zA-Z]\b)-1个或多个数字或后跟单词边界的ASCII字母
(?![.-]?\d)-没有可选的.或-后跟当前位置右侧允许的数字
.*-除换行符以外的任何0个或更多字符，尽可能多

熊猫测试：

>>> Fairytales_in['Titles'].str.extract(rx, expand=False)
0                                                NaN
1       Tales.3.2.Dancing Shoes, ballgowns and frogs
2                                                NaN
3                                                NaN
4              Ogres-1.1.The wondrous world of Shrek
5     Witches-1-4Maleficient and the malicious curse
6                         Tales.2.1.The big bad wolf
7                                                NaN
8                                                NaN
9      Tales.5.f.Cinderella and the pumpkin carriage
10                                               NaN
11                                               NaN
Name: Titles, dtype: object

相关问题更多 >

编程相关推荐

热门问题

热门文章