从单个字符串中分离两个datetime值

2条回答

网友

1楼 · 编辑于 2024-04-19 07:51:36

使用re.findall()

import re

text = "2019-08-03-2019-08-09"
match = re.findall(r'\d{4}-\d{2}-\d{2}', text)

print (match)

输出：

['2019-08-03', '2019-08-09']

示例：

import re

text = "2019-08-03-2019-08-09xxxxxThis is test xxxxx -2017-01-01"
match = re.findall(r'\d{4}-\d{2}-\d{2}', text)

print (match)

输出：

['2019-08-03', '2019-08-09', '2017-01-01']

网友

2楼 · 编辑于 2024-04-19 07:51:36

我认为最好的方法是让您的客户机将分隔符从-更改为其他内容，如空格、制表符或不会在ISO 8601字符串中显示的内容，然后在此基础上拆分，但如果必须使用-作为分隔符和，则必须支持任何有效的ISO 8601字符串，最好的选择是尝试寻找模式-( |\d{4})，因为所有有效的iso8601日期时间要么以4位数字开始，要么以开始。如果你发现一个破折号后跟4个数字，你要么找到了一个负时区，要么找到了下一个ISO8601日期时间的开始。你知道吗

此外，没有包含\d{4}-\d{4}的有效ISO 8601日期时间格式，如果找到表示时区偏移量的-(\d{4})，则它必须位于第一个ISO 8601字符串的末尾，因此使用负向前看足以确保模式不重复，因此，将其放在一起：

import re
from dateutil.parser import isoparse


def parse_iso8601_pairs(isostr):
    # In a string containing two ISO 8601 strings delimited by -, the substring
    # "-\d{4}" is only found at the beginning of the second datetime or the
    # end of *either* datetime. If it is found at the end of the first datetime,
    # it will always be followed by `-\d{4}`, so we can use negative lookahead
    # to find the beginning of the next string.
    #
    # Note: ISO 8601 datetimes can also begin with ` `, but parsing these is
    # not supported yet in dateutil.parser.isoparse, as of verison 2.8.0. The
    # regex includes this type of string in order to make at least the splitting
    # method work even if the parsing method doesn't support "missing year"
    # ISO 8601 strings.
    m = re.search(r"-( |\d{4})(?!-( |\d{4}))", isostr)
    dt1 = None
    dt2 = None

    if m is None:
        raise ValueError(f"String does not contain two ISO 8601 datetimes " +
                         "delimited by -: {isostr}")

    split_on = m.span()[0]
    str1 = isostr[0:split_on]
    str2 = isostr[split_on + 1:]

    # You may want to wrap the error handling here with a nicer message
    dt1 = isoparse(str1)
    dt2 = isoparse(str2)

    return dt1, dt2

据我所知，这将适用于由-分隔的任何一对符合iso8601的字符串，除了模糊的“年份缺失”格式： MM-?DD。代码的拆分部分即使在 04-01这样的字符串中也可以工作，但是^{}当前不支持这种格式，因此解析将失败。可能更成问题的是 MMDD也是有效的ISO8601格式，它将匹配-\d{4}并给出错误的分割。如果您想支持这种格式，并且有一个经过修改的解析器可以处理 MMDD，我相信您可以制作一个更复杂的正则表达式来处理 MMDD的情况（如果有人想这样做，我很乐意将其编辑到本文中），或者，您可以简单地通过使用re.finditer对匹配项进行迭代来“猜测并检查”，直到找到一个位置来拆分字符串，从而在分隔符的两侧生成一个有效的ISO 8601 datetime。你知道吗

注意：如果用datetime.datetime.fromisoformat替换dateutil.parser.isoparse，这个方法也会起作用。不同之处在于datetime.datetime.fromisoformat解析的字符串主要是dateutil.parser.isoparse处理的内容的子集—它是datetime.datetime.isoformat的逆，并将解析通过调用datetime对象上的isoformat方法可以创建的任何内容，其中isoparse用于解析任何有效的iso8601字符串。如果您知道datetimes是通过调用isoformat()方法生成的，那么fromisoformat是iso8601解析器的更好选择。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章