python re.compile包含变量和数字的字符串

0 投票

7 回答

5777 浏览

数据工程师

提问于 2025-04-16 16:18

你好，我想要匹配以下内容：

test = re.compile(r' [0-12](am|pm) [1-1000] days from (yesterday|today|tomorrow)')

用这个匹配：

print test.match(" 3pm 2 days from today")

但是它返回的是None，我哪里出错了？我刚开始接触正则表达式，读了文档以为这样应该可以！任何帮助都很感激。

chrism

--------------------------------------------------------------------------------------

我在问一个新问题，关于使用类似上面的方法设计一个系统，涉及自然语言处理，详细内容可以在这里找到。

正则表达式文本处理编程错误字符串匹配模式匹配数据解析自然语言处理变量处理

7 个回答

试试这个：

test = re.compile(' \d+(am|pm) \d+ days from (yesterday|today|tomorrow)')

回答于 2025-04-16 由 Python大师

分享举报

关于这个

test = re.compile(r' ([0-9]|1[012])(am|pm) \d+ days from (yesterday|today|tomorrow)')

小时的部分应该是0到12之间的数字，也就是0、1、2，一直到12，但不能是13、14，一直到19。

你可以用类似的方法限制日期的部分，比如限制在1到1000之间，也就是(1000|\d{1,3})。

回答于 2025-04-16 由 Python大师

分享举报

这是我想说的。仔细研究这个正则表达式会让你学到一些东西：

import re
reobj = re.compile(
    r"""# Loosely match a date/time reference
    ^                    # Anchor to start of string.
    \s*                  # Optional leading whitespace.
    (?P<time>            # $time: military or AM/PM time.
      (?:                # Group for military hours options.
        [2][0-3]         # Hour is either 20, 21, 22, 23,
      | [01]?[0-9]       # or 0-9, 00-09 or 10-19
      )                  # End group of military hours options.
      (?:                # Group for optional minutes.
        :                # Hours and minutes separated by ":"
        [0-5][0-9]       # 00-59 minutes
      )?                 # Military minutes are optional.
    |                    # or time is given in AM/PM format.
      (?:1[0-2]|0?[1-9]) # 1-12 or 01-12 AM/PM options (hour)
      (?::[0-5][0-9])?   # Optional minutes for AM/PM time.
      \s*                # Optional whitespace before AM/PM.
      [ap]m              # Required AM or PM (case insensitive)
    )                    # End group of time options.
    \s+                  # Required whitespace.
    (?P<offset> \d+ )    # $offset: count of time increments.
    \s+                  # Required whitespace.
    (?P<units>           # $units: units of time increment.
      (?:sec(?:ond)?|min(ute)?|hour|day|week|month|year|decade|century)
      s?                 # Time units may have optional plural "s".
    )                    # End $units: units of time increment.
    \s+                  # Required whitespace.
    (?P<dir>from|before|after|since) # #dir: Time offset direction.
    \s+                  # Required whitespace.
    (?P<base>yesterday|today|tomorrow|(?:right )?now)
    \s*                  # Optional whitespace before end.
    $                    # Anchor to end of string.""", 
    re.IGNORECASE | re.VERBOSE)
match = reobj.match(' 3 pm 2 days from today')
if match:
    print('Time:       %s' % (match.group('time')))
    print('Offset:     %s' % (match.group('offset')))
    print('Units:      %s' % (match.group('units')))
    print('Direction:  %s' % (match.group('dir')))
    print('Base time:  %s' % (match.group('base')))
else:
    print("No match.")

输出结果：

r"""
Time:       3 pm
Offset:     2
Units:      days
Direction:  from
Base time:  today
"""

这个正则表达式展示了几个需要注意的点：

正则表达式非常强大（而且很有用）！
这个正则表达式确实可以验证数字，但你可以看到，这样做既繁琐又困难（所以不推荐这样做——我这里展示它是为了说明为什么不应该这么做）。用正则表达式直接提取数字，然后用普通代码来验证范围要简单得多。
命名捕获组可以让你从大段文本中提取多个数据子字符串时更轻松。
写正则表达式时，最好使用自由间距和详细模式，给各个组适当缩进，并添加很多描述性的注释。这在编写正则表达式时以及后期维护时都很有帮助。

现代正则表达式是一种丰富而强大的语言。一旦你学会了语法，并养成写详细、适当缩进和注释清晰的代码的习惯，那么即使是像上面那样复杂的正则表达式也会变得容易编写、容易阅读和维护。可惜的是，正则表达式常常被认为难以使用、笨重且容易出错（因此不推荐用于复杂任务）。

祝你使用正则表达式愉快！

回答于 2025-04-16 由 Python大师

分享举报

python re.compile包含变量和数字的字符串

--------------------------------------------------------------------------------------

7 个回答

撰写回答