我需要从文本中提取具有标题的唯一名称,例如Lord | Baroness | Lady | Baron,并将其与另一个列表匹配。我努力得到正确的结果,希望社会各界能帮助我。谢谢
import re
def get_names(text):
# find nobel titles and grab it with the following name
match = re.compile(r'(Lord|Baroness|Lady|Baron) ([A-Z][a-z]+) ([A-Z][a-z]+)')
names = list(set(match.findall(text)))
# remove duplicates based on the index in tuples
names_ = list(dict((v[1],v) for v in sorted(names, key=lambda names: names[0])).values())
names_lst = list(set([' '.join(map(str, name)) for name in names_]))
return names_lst
text = 'Baroness Firstname Surname and Baroness who is also known as Lady Anothername and Lady Surname or Lady Firstname.'
names_lst = get_names(text)
print(names_lst)
现在产生:['Baroness Firstname Surname']
所需输出:['Baroness Firstname Surname', 'Lady Anothername']
但不是Lady Surname
或Lady Firstname
然后我需要将结果与此列表匹配:
other_names = ['Firstname Surname', 'James', 'Simon Smith']
并从中删除元素'Firstname Surname'
,因为它与“所需输出”中男爵夫人的名字和姓氏匹配
我建议您采用以下解决方案:
相关问题 更多 >
编程相关推荐