在Python中计算日期间的时间差
假设我有两个列表,内容大概是这样的:
L1=['Smith, John, 2008, 12, 10, Male', 'Bates, John, 2006, 1, Male', 'Johnson, John, 2009, 1, 28, Male', 'James, John, 2008, 3, Male']
L2=['Smith, Joy, 2008, 12, 10, Female', 'Smith, Kevin, 2008, 12, 10, Male', 'Smith, Matt, 2008, 12, 10, Male', 'Smith, Carol, 2000, 12, 11, Female', 'Smith, Sue, 2000, 12, 11, Female', 'Johnson, Alex, 2008, 3, Male', 'Johnson, Emma, 2008, 3, Female', 'James, Peter, 2008, 3, Male', 'James, Chelsea, 2008, 3, Female']
我想做的是比较每个家庭成员(同姓)和他们家里的“约翰”的日期。日期的格式有很多种,有的包括年、月、日,有的只有年和月,还有的只有年。我想找出约翰的日期和每个家庭成员的日期之间的差异,尽可能具体(如果一个日期有年、月、日,而另一个只有年和月,那我就只计算月份和年份之间的差异)。这是我到目前为止尝试过的,但没有成功,因为使用的名字和日期不对(每个约翰只给了一个兄弟姐妹),而且计算日期之间的时间方式也让人困惑且不正确:
for line in L1:
type=line.split(',')
if len(type)>=1:
family=type[0]
if len(type)==6:
yearA=type[2]
monthA=type[3]
dayA=type[4]
sex=type[5]
print '%s, John Published in %s, %s, %s, %s' %(family, yearA, monthA, dayA, sex)
elif len(type)==5:
yearA=type[2]
monthA=type[3]
sex=type[4]
print '%s, John Published in %s, %s, %s' %(family, yearA, monthA, sex)
elif len(type)==4:
yearA=type[2]
sex=type[3]
print '%s, John Published in %s, %s' %(family, yearA, sex)
for line in L2:
if re.search(family, line):
word=line.split(',')
name=word[1]
if len(word)==6:
yearB=word[2]
monthB=word[3]
dayB=word[4]
sex=word[5]
elif len(word)==5:
yearB=word[2]
monthB=word[3]
sex=word[4]
elif len(word)==4:
yearB=word[2]
sex=word[3]
if dayA and dayB:
yeardiff= int(yearA)-int(yearB)
monthdiff=int(monthA)-int(monthB)
daydiff=int(dayA)-int(dayB)
print'%s, %s Published %s year(s), %s month(s), %s day(s) before/after John, %s' %(family, name, yeardiff, monthdiff, daydiff, sex)
elif not dayA and not dayB and monthA and monthB:
yeardiff= int(yearA)-int(yearB)
monthdiff=int(monthA)-int(monthB)
print'%s, %s Published %s year(s), %s month(s), before/after John, %s' %(family, name, yeardiff, monthdiff, sex)
elif not monthA and not monthB and yearA and yearB:
yeardiff= int(yearA)-int(yearB)
print'%s, %s Published %s year(s), before/after John, %s' %(family, name, yeardiff, sex)
我希望最后得到的结果像这样,如果可能的话,还希望程序能区分兄弟姐妹是早于还是晚于约翰,并且只有在两个比较的日期都有的情况下,才打印出月份和日期:
Smith, John Published in 2008, 12, 10, Male
Smith, Joy Published _ year(s) _month(s) _day(s) before/after John, Female
Smith, Kevin Published _ year(s) _month(s) _day(s) before/after John, Male
Smith, Matt Published _ year(s) _month(s) _day(s) before/after John, Male
Smith, Carol Published _ year(s) _month(s) _day(s) before/after John, Female
Smith, Sue Published _ year(s) _month(s) _day(s) before/after John, Female
Bates, John Published in 2006, 1, Male
Johnson, John Published in 2009, 1, 28, Male
Johnson, Alex Published _ year(s) _month(s) _day(s) before/after John, Male
Johnson, Emma Published _ year(s) _month(s) _day(s) before/after John, Female
James, John Published in 2008, 3, Male
James, Peter Published _ year(s) _month(s) _day(s) before/after John, Male
James, Chelsea Published _ year(s) _month(s) _day(s) before/after John, Female
3 个回答
可能已经有现成的模块可以处理这种情况,但我建议你先把日期转换成统一的时间单位(比如从19XX年1月1日开始的天数)。这样你就可以很方便地进行比较、相减等等,最后再根据需要把结果转换回天数来显示。如果你只想用天数来表示,这个过程应该挺简单的。
正如Joe Kington所建议的,dateutil模块在这方面非常有用。特别是,它可以告诉你两个日期之间的差异,包括年、月和天。(如果自己计算的话,还得考虑闰年等问题。用一个经过充分测试的模块要比自己重新发明这个轮子要好得多。)
这个问题适合用类来处理。
我们来创建一个Person类,用来记录一个人的名字、性别和出版日期:
class Person(object):
def __init__(self,lastname,firstname,gender=None,year=None,month=None,day=None):
self.lastname=lastname
self.firstname=firstname
self.ymd=VagueDate(year,month,day)
self.gender=gender
出版日期可能会有缺失的数据,所以我们需要一个专门的类来处理缺失的日期数据:
class VagueDate(object):
def __init__(self,year=None,month=None,day=None):
self.year=year
self.month=month
self.day=day
def __sub__(self,other):
d1=self.asdate()
d2=other.asdate()
rd=relativedelta.relativedelta(d1,d2)
years=rd.years
months=rd.months if self.month and other.month else None
days=rd.days if self.day and other.day else None
return VagueDateDelta(years,months,days)
datetime
模块定义了datetime.datetime
对象,并使用datetime.timedelta
对象来表示两个datetime.datetime
对象之间的差异。类似地,我们来定义一个VagueDateDelta
,用来表示两个VagueDate
之间的差异:
class VagueDateDelta(object):
def __init__(self,years=None,months=None,days=None):
self.years=years
self.months=months
self.days=days
def __str__(self):
if self.days is not None and self.months is not None:
return '{s.years} years, {s.months} months, {s.days} days'.format(s=self)
elif self.months is not None:
return '{s.years} years, {s.months} months'.format(s=self)
else:
return '{s.years} years'.format(s=self)
现在我们已经建立了一些方便的工具,解决这个问题就不难了。
第一步是解析字符串列表,把它们转换成Person对象:
def parse_person(text):
data=map(str.strip,text.split(','))
lastname=data[0]
firstname=data[1]
gender=data[-1]
ymd=map(int,data[2:-1])
return Person(lastname,firstname,gender,*ymd)
johns=map(parse_person,L1)
peeps=map(parse_person,L2)
接下来,我们把peeps
重新组织成一个家庭成员的字典:
family=collections.defaultdict(list)
for person in peeps:
family[person.lastname].append(person)
最后,你只需遍历每个john
及其家庭成员,比较出版日期,然后报告结果。
完整的脚本可能看起来像这样:
import datetime as dt
import dateutil.relativedelta as relativedelta
import pprint
import collections
class VagueDateDelta(object):
def __init__(self,years=None,months=None,days=None):
self.years=years
self.months=months
self.days=days
def __str__(self):
if self.days is not None and self.months is not None:
return '{s.years} years, {s.months} months, {s.days} days'.format(s=self)
elif self.months is not None:
return '{s.years} years, {s.months} months'.format(s=self)
else:
return '{s.years} years'.format(s=self)
class VagueDate(object):
def __init__(self,year=None,month=None,day=None):
self.year=year
self.month=month
self.day=day
def __sub__(self,other):
d1=self.asdate()
d2=other.asdate()
rd=relativedelta.relativedelta(d1,d2)
years=rd.years
months=rd.months if self.month and other.month else None
days=rd.days if self.day and other.day else None
return VagueDateDelta(years,months,days)
def asdate(self):
# You've got to make some kind of arbitrary decision when comparing
# vague dates. Here I make the arbitrary decision that missing info
# will be treated like 1s for the purpose of calculating differences.
return dt.date(self.year,self.month or 1,self.day or 1)
def __str__(self):
if self.day is not None and self.month is not None:
return '{s.year}, {s.month}, {s.day}'.format(s=self)
elif self.month is not None:
return '{s.year}, {s.month}'.format(s=self)
else:
return '{s.year}'.format(s=self)
class Person(object):
def __init__(self,lastname,firstname,gender=None,year=None,month=None,day=None):
self.lastname=lastname
self.firstname=firstname
self.ymd=VagueDate(year,month,day)
self.gender=gender
def age_diff(self,other):
return self.ymd-other.ymd
def __str__(self):
fmt='{s.lastname}, {s.firstname} ({s.gender}) ({d.year},{d.month},{d.day})'
return fmt.format(s=self,d=self.ymd)
__repr__=__str__
def __lt__(self,other):
d1=self.ymd.asdate()
d2=other.ymd.asdate()
return d1<d2
def parse_person(text):
data=map(str.strip,text.split(','))
lastname=data[0]
firstname=data[1]
gender=data[-1]
ymd=map(int,data[2:-1])
return Person(lastname,firstname,gender,*ymd)
def main():
L1=['Smith, John, 2008, 12, 10, Male', 'Bates, John, 2006, 1, Male',
'Johnson, John, 2009, 1, 28, Male', 'James, John, 2008, 3, Male']
L2=['Smith, Joy, 2008, 12, 10, Female', 'Smith, Kevin, 2008, 12, 10, Male',
'Smith, Matt, 2008, 12, 10, Male', 'Smith, Carol, 2000, 12, 11, Female',
'Smith, Sue, 2000, 12, 11, Female', 'Johnson, Alex, 2008, 3, Male',
'Johnson, Emma, 2008, 3, Female', 'James, Peter, 2008, 3, Male',
'James, Chelsea, 2008, 3, Female']
johns=map(parse_person,L1)
peeps=map(parse_person,L2)
print(pprint.pformat(johns))
print
print(pprint.pformat(peeps))
print
family=collections.defaultdict(list)
for person in peeps:
family[person.lastname].append(person)
# print(family)
pub_fmt='{j.lastname}, {j.firstname} Published in {j.ymd}, {j.gender}'
rel_fmt=' {r.lastname}, {r.firstname} Published {d} {ba} John, {r.gender}'
for john in johns:
print(pub_fmt.format(j=john))
for relative in family[john.lastname]:
diff=john.ymd-relative.ymd
ba='before' if relative<john else 'after'
print(rel_fmt.format(
r=relative,
d=diff,
ba=ba,
))
if __name__=='__main__':
main()
结果是
[Smith, John (Male) (2008,12,10),
Bates, John (Male) (2006,1,None),
Johnson, John (Male) (2009,1,28),
James, John (Male) (2008,3,None)]
[Smith, Joy (Female) (2008,12,10),
Smith, Kevin (Male) (2008,12,10),
Smith, Matt (Male) (2008,12,10),
Smith, Carol (Female) (2000,12,11),
Smith, Sue (Female) (2000,12,11),
Johnson, Alex (Male) (2008,3,None),
Johnson, Emma (Female) (2008,3,None),
James, Peter (Male) (2008,3,None),
James, Chelsea (Female) (2008,3,None)]
Smith, John Published in 2008, 12, 10, Male
Smith, Joy Published 0 years, 0 months, 0 days after John, Female
Smith, Kevin Published 0 years, 0 months, 0 days after John, Male
Smith, Matt Published 0 years, 0 months, 0 days after John, Male
Smith, Carol Published 7 years, 11 months, 29 days before John, Female
Smith, Sue Published 7 years, 11 months, 29 days before John, Female
Bates, John Published in 2006, 1, Male
Johnson, John Published in 2009, 1, 28, Male
Johnson, Alex Published 0 years, 10 months before John, Male
Johnson, Emma Published 0 years, 10 months before John, Female
James, John Published in 2008, 3, Male
James, Peter Published 0 years, 0 months after John, Male
James, Chelsea Published 0 years, 0 months after John, Female
正如评论中提到的(在@Matt的回答里),你至少需要“年、月、日”才能使用 datetime.date 和 datetime.timedelta。从上面的示例数据来看,有些条目可能缺少“日”,这就让事情变得复杂多了。
如果你不介意使用默认的月份/日期(比如说1月1日),那么你可以很快把这些日期转换成datetime.date的实例。
举个简单的例子:
johns = []
for s in L1:
# NOTE: not the most robust parsing method.
v = [x.strip() for x in s.split(",")]
data = {
"gender": v[-1],
"last_name": v[0],
"first_name": v[1],
}
# build keyword args for datetime.date()
v = v[2:-1] # remove parsed data
kwargs = { "year": int(v.pop(0)), "month": 1, "day":1 }
try:
kwargs["month"] = int(v.pop(0))
kwargs["day"] = int(v.pop(0))
except:
pass
data["date"] = date(**kwargs)
johns.append(data)
这样你就得到了一个包含名字、性别和日期的 dict
列表。你也可以对 L2
做同样的操作,通过把一个 date
减去另一个来计算日期差(这会产生一个 timedelta 对象)。
>>> a = date(2008, 12,12)
>>> b = date(2010, 1, 13)
>>> delta = b - a
>>> print delta.days
397
>>> print "%d years, %d days" % divmod(delta.days, 365)
1 years, 32 days
我故意省略了 月,因为把30天当作一个月并不简单。可以说,假设一年有365天也是不准确的,因为还要考虑闰年。
更新:以年、月、日的形式显示时间差
如果你需要以年、月和日的形式显示时间差,直接对 timedelta
返回的天数使用 divmod
可能不准确,因为这没有考虑闰年和每个月的天数不同。你需要手动比较每年的每个月和每一天。
这是我尝试写的一个这样的函数。(只经过了简单测试,所以请谨慎使用)
from datetime import timedelta
def my_time_delta(d1,d2):
"""
Returns time delta as the following tuple:
("before|after|same", "years", "months", "days")
"""
if d1 == d2:
return ("same",0,0,0)
# d1 before or after d2?
if d1 > d2:
ba = "after"
d1,d2 = d2,d1 # swap so d2 > d1
else:
ba = "before"
years = d2.year - d1.year
months = d2.month - d1.month
days = d2.day - d1.day
# adjust for -ve days/months
if days < 0:
# get last day of month for month before d1
pre_d1 = d1 - timedelta(days=d1.day)
days = days + pre_d1.day
months = months - 1
if months < 0:
months = months + 12
years = years - 1
return (ba, years, months, days)
示例用法:
>>> my_time_delta(date(2003,12,1), date(2003,11,2))
('after', 0, 0, 30)
>>> my_time_delta(date(2003,12,1), date(2004,11,2))
('before', 0, 11, 1)
>>> my_time_delta(date(2003,2,1), date(1992,3,10))
('after', 10, 10, 20)
>>> p,y,m,d = my_time_delta(date(2003,2,1), date(1992,3,10))
>>> print "%d years, %d months, %d days %s" % (y,m,d,p)
10 years, 10 months, 20 days after