使用Python/Pandas和正则表达式从全名列表中提取姓氏

s = ['DR. James Coffins', 'Zacharias Pallefas', 'Matthew Ebnel', 'Ranzzith Redly', 'GEORGE GEORGIADAKIS', 'HARISH KUMARAN K', 'Christiaan Kraanlen, CFA', 'Mary K. Lein, CFA, COL', 'Alexandre Cegra, CFA, CAIA' 'Anna Bely']

Loop through the elements of the list. For each element: split the element into subelements using spaces. Then: a) If there are four or less subelements start from the beginning and examine the first four subelements. a1) If the first subelement is larger than 2 letters then: If the second subelement is larger than one letter, return the second subelement. Otherwise, return the third subelement. a2) if the first subelement is 2 letters then drop it and repeat step a1

2条回答

网友

1楼 · 编辑于 2024-04-23 17:11:27

在跳过包含.且不在排除列表['dr', 'mr', 'mrs', 'mrs', 'miss', 'prof']中的单词之后，总是抓住每行的第二个元素如何

>>> exclude_tags = ['dr', 'mr', 'mrs', 'mrs', 'miss', 'prof']
>>> [[y for y in x.split() if '.' not in y and y.lower() not in exclude_tags][1].rstrip(',').capitalize() for x in s]
['Coffins', 'Pallefas', 'Ebnel', 'Redly', 'Georgiadakis', 'Kumaran', 'Kraanlen', 'Lein', 'Cegra']

网友

2楼 · 编辑于 2024-04-23 17:11:27

对于其他遇到这个问题的人，请记住，一般来说，从全名中提取一个人的姓氏是不可能的，请阅读Falsehoods Programmers Believe About Names

Sunitha的解决方案将失败，因为任何人的姓氏由多个代币组成（梵高），有多个姓氏（冈萨雷斯·拉米雷斯），名字有多个代币（玛丽·简·沃森），选择将中间名放在创建此列表的任何系统中，来自亚洲文化，名字/姓氏的顺序有时颠倒，等等

相关问题更多 >

编程相关推荐

热门问题

热门文章