如何解析xsd:dateTime格式?
xsd:dateTime类型的值可以有很多种不同的格式,具体可以参考RELAX NG中的描述。
我该如何把这些不同的格式转换成时间或日期时间对象呢?
2 个回答
1
你可以试试 dateutil.parser
这个模块,它来自于 python-dateutil。或者你也可以看看 isodate(我还没用过这个,但看起来挺有意思的,而且它是专门为了处理 ISO 8601 格式而制作的)。
2
其实这个格式限制挺多的,尤其是跟ISO 8601标准比起来。用正则表达式(regex)来处理,基本上就相当于用strptime,然后自己来处理时区偏移(而strptime是不会处理这个的)。
import datetime
import re
def parse_timestamp(s):
"""Returns (datetime, tz offset in minutes) or (None, None)."""
m = re.match(""" ^
(?P<year>-?[0-9]{4}) - (?P<month>[0-9]{2}) - (?P<day>[0-9]{2})
T (?P<hour>[0-9]{2}) : (?P<minute>[0-9]{2}) : (?P<second>[0-9]{2})
(?P<microsecond>\.[0-9]{1,6})?
(?P<tz>
Z | (?P<tz_hr>[-+][0-9]{2}) : (?P<tz_min>[0-9]{2})
)?
$ """, s, re.X)
if m is not None:
values = m.groupdict()
if values["tz"] in ("Z", None):
tz = 0
else:
tz = int(values["tz_hr"]) * 60 + int(values["tz_min"])
if values["microsecond"] is None:
values["microsecond"] = 0
else:
values["microsecond"] = values["microsecond"][1:]
values["microsecond"] += "0" * (6 - len(values["microsecond"]))
values = dict((k, int(v)) for k, v in values.iteritems()
if not k.startswith("tz"))
try:
return datetime.datetime(**values), tz
except ValueError:
pass
return None, None
它没有把时区偏移应用到日期时间上,而且负年份在日期时间处理上也有问题。这两个问题可以通过使用不同的时间戳类型来解决,这种类型能够处理xsd:dateTime所需的完整范围。
valid = [
"2001-10-26T21:32:52",
"2001-10-26T21:32:52+02:00",
"2001-10-26T19:32:52Z",
"2001-10-26T19:32:52+00:00",
#"-2001-10-26T21:32:52",
"2001-10-26T21:32:52.12679",
]
for v in valid:
print
print v
r = parse_timestamp(v)
assert all(x is not None for x in r), v
# quick and dirty, and slightly wrong
# (doesn't distinguish +00:00 from Z among other issues)
# but gets through the above cases
tz = ":".join("%02d" % x for x in divmod(r[1], 60)) if r[1] else "Z"
if r[1] > 0: tz = "+" + tz
r = r[0].isoformat() + tz
print r
assert r.startswith(v[:len("CCYY-MM-DDThh:mm:ss")]), v
print "---"
invalid = [
"2001-10-26",
"2001-10-26T21:32",
"2001-10-26T25:32:52+02:00",
"01-10-26T21:32",
]
for v in invalid:
print v
r = parse_timestamp(v)
assert all(x is None for x in r), v