Python 根据日期时间递归合并两个对象列表

Question

接着我之前的问题，我现在从两个文件A和B中读取数据，把2009年的日期放到一个叫AB2对象的列表（subAB）中，放在第一个非2009年的行之前。

class AB2(object):
    def __init__(self, datetime, a=False, b=False):
        self.datetime = datetime
        self.a = a
        self.b = b
        self.subAB = []

举个例子：

file A: 20111225, 20111226, 20090101
file B: 20111225, 20111226, 20090101, 20090102, 20111227, 20090105

结果应该是这样的：（方括号里是subAB列表）

AB2(20111225, a = true, b = true, [])
AB2(20111226, a = true, b = true, 
    [AB2(20090101, a = true, b = true, []),
     AB2(20090102, a = false, b = true, [])], 
AB2(20111227, a = false, b = true, 
    [AB2(20090105, a = false, b = true)])

不幸的是，这让之前的解决方案变得复杂：

list_of_objects = [(i, i in A, i in B) for i in set(A) | set(B)]

原因是：

顺序很重要（2009年的项目要放在文件中第一个2011年的项目之前）
文件中可能有多个相同时间的项目
现在还需要关注subAB对象的列表

由于这些原因，我们不能使用现有的集合，因为它会去掉重复项并且失去顺序。我尝试过使用OrderedSet的做法，但我想不出怎么在这里应用它。

我现在的代码：

listA = open_and_parse(file A) # list of parsed dates
listAObjects = [AB2(dt, True, None) for dt in listA] # list of AB2 Objects from list A
nested_listAObjects = nest(listAObjects) # puts 2009 objects into 2011 ones
<same for file B>
return combine(nested_listAObjects, nested_listBObjects)

嵌套方法：（把2009年的项目放到之前的2011年的项目中。如果2009年的项目在文件开头，就忽略它们）

def nest(list):
    previous = None
    for item in list:       
        if item.datetime.year <= 2009:
            if previous is not None:
                previous.subAB.append(item)
            else:
                previous = item

    return [item for item in list if item.datetime.year > 2009]

不过我在我的combine函数上有点卡住了：

def combine(nestedA, nestedB):
    combined = nestedA + nestedB
    combined.sort(key=lambda x: x.datetime)

    <magic>

    return combined

此时，如果没有什么特别的处理，combined看起来会是这样的：

AB2(20111225, a = true, b = None, []) # \ 
AB2(20111225, a = None, b = true, []) # / these two should merge to AB2(20111225, a = true, b = true, [])
AB2(20111226, a = true, b = None, 
    [AB2(20090101, a = true, b = None, []),
     AB2(20090102, a = true, b = None, [])], 
AB2(20111226, a = None, b = true, 
    [AB2(20090101, a = None, b = true, [])], 
# The above two lines should combine, and so should their subAB lists (but only recurse to that level, not infinitely)
AB2(20111227, a = None, b = true, 
    [AB2(20090105, a = None, b = true)])

我希望我发一个新问题是可以的——这将是一个完全不同的解决方案。也很抱歉发了这么长的内容，我觉得解释我正在做的所有事情会更好，这样你们才能完全理解问题，也许能提供一个整体的解决方案，而不仅仅是针对combine方法的。谢谢！

编辑：澄清：

基本上，我在检查两台连接的电脑的日志，比较它们是否在某个特定时间都关机，还是只有一台关机。如果电脑在能获取到真实的2012年时间之前重启，它们会在2009年时间启动（但不一定是1月1日——有时是1月4日等）。因此，我试图把后面的2009年关机记录和之前的关机记录关联起来，这样我就能知道它们是否在快速重启。

2011/2012年的日期应该是有序的，但2009年的日期却不是。一个电脑的日志文件（在我的例子中是fileA）可能看起来是这样的：

2011/12/15
2011/12/17
2011/12/19 # Something goes wrong, and causes the computer to reset 5 times rapidly
2009/01/01 
2009/01/01
2009/01/04
2009/01/01
2011/12/20 # And everything is better again
2011/12/25

实际上，它们其实是日期时间（例如2009/01/01 01:57:01），所以我可以简单地比较两个日期时间是否在一定的timedelta范围内。

我希望能找到一个更简洁的整体解决方案，或者针对合并这两个AB2对象列表的具体解决方案。

合并这两个列表最简单的方法是遍历已经排序的合并列表（2009年的对象已经放到它们的父对象中），比较下一个项目是否和当前项目是同一天，并从这些项目中创建一个新列表。

数据结构日期时间处理递归算法数据比较日志分析列表排序有序集合对象合并

Python 根据日期时间递归合并两个对象列表

1 个回答

撰写回答