Python正则表达式从句子中提取地址和旅行时间？

2条回答

网友

1楼 · 编辑于 2024-04-20 14:04:01

from\s(?P<Origin>[\d\w\s]*?)\sto\s(?P<Dest>[\d\w\s]*?)(?:$|(?P<Time>\b(?:tomorrow|at)\b.*))

你可以看看我的解决方案in a live online demo at regex101.com。你知道吗

有三个命名的捕获组，每个捕获组对应一个目标变量。你知道吗

您将注意到在Time capture组中，我有(tomorrow|at)，它用于匹配时间子字符串的时间起始字。你知道吗

虽然这适用于您的特定问题，但必须对所有其他可能检查的时间值进行扩展。你知道吗

如果我们不知道我们可以或不能做出什么样的假设，那么很难做出一个能够捕获所有边缘情况的正则表达式，所以请随意发布完整的预期输入集。你知道吗

网友

2楼 · 编辑于 2024-04-20 14:04:01

这项工作针对给定的样本：

import re

string = """
I want to go from Cosmos Station to 525 Greenlane highway.
I want to go from Cosmos Station to 525 Greenlane highway tomorrow at 8am.
I want to go from Cosmos Station to 525 Greenlane highway at 8am
"""
# to make the pattern a little readable
# in your example time separator are either at or tomorrow at you can add more
at_separators = {'at': '(?:(?:tomorrow at)|(?:at))'}
# after to we capture all string if there is no at separator after it
# if there is second group will capture the string between too and at separator
pattern = 'from\s(.+?)\sto\s(.+?(?=\s{at})|.+(?!{at}\s))(?:\s{at}(.+))?'.format(**at_separators)
pattern = re.compile(pattern, flags=re.MULTILINE)
# no you hust need to clean the result to clean '.' and noises because doing this
# in the pattern will make it a unreadable.
print(re.findall(pattern, string))

输出：

[('Cosmos Station', '525 Greenlane highway.', ''), ('Cosmos Station', '525 Greenlane highway', ' 8am.'), ('Cosmos Station', '525 Greenlane highway', ' 8am')]

正如您在第一组中看到的，第三个位置是空字符串，因为没有时间。这个键是正向的lookahead.+?(?=\s{at})，它不会占用时间部分，但是它会在(?:\s{at}(.+))?之前返回。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python正则表达式从句子中提取地址和旅行时间？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >