合并有重叠时间范围的时间区间元组列表
我有一个包含多个元组的列表,每个元组都是一个 (开始时间, 结束时间)
。我想把所有重叠的时间段合并起来,并返回一个不重复的时间段列表。比如说:
[(1, 5), (2, 4), (3, 6)] ---> [(1,6)]
[(1, 3), (2, 4), (5, 8)] ---> [(1, 4), (5,8)]
这是我实现这个功能的方法。
# Algorithm
# initialranges: [(a,b), (c,d), (e,f), ...]
# First we sort each tuple then whole list.
# This will ensure that a<b, c<d, e<f ... and a < c < e ...
# BUT the order of b, d, f ... is still random
# Now we have only 3 possibilities
#================================================
# b<c<d: a-------b Ans: [(a,b),(c,d)]
# c---d
# c<=b<d: a-------b Ans: [(a,d)]
# c---d
# c<d<b: a-------b Ans: [(a,b)]
# c---d
#================================================
def mergeoverlapping(initialranges):
i = sorted(set([tuple(sorted(x)) for x in initialranges]))
# initialize final ranges to [(a,b)]
f = [i[0]]
for c, d in i[1:]:
a, b = f[-1]
if c<=b<d:
f[-1] = a, d
elif b<c<d:
f.append((c,d))
else:
# else case included for clarity. Since
# we already sorted the tuples and the list
# only remaining possibility is c<d<b
# in which case we can silently pass
pass
return f
我想弄清楚:
- 在某个Python模块中有没有内置的函数可以更高效地完成这个任务?或者
- 有没有更符合Python风格的方法来实现同样的目标?
感谢你的帮助!谢谢!
6 个回答
1
先把所有的边界排序,然后找出所有一对一对的边界,条件是一个边界的结束后面紧跟着另一个边界的开始。
def mergeOverlapping(initialranges):
def allBoundaries():
for r in initialranges:
yield r[0], True
yield r[1], False
def getBoundaries(boundaries):
yield boundaries[0][0]
for i in range(1, len(boundaries) - 1):
if not boundaries[i][1] and boundaries[i + 1][1]:
yield boundaries[i][0]
yield boundaries[i + 1][0]
yield boundaries[-1][0]
return getBoundaries(sorted(allBoundaries()))
嗯,虽然看起来不太美观,但写这个代码还是挺有趣的!
补充:几年后,在收到一个赞之后,我才意识到我的代码是错的!这是新的版本,纯粹是为了好玩:
def mergeOverlapping(initialRanges):
def allBoundaries():
for r in initialRanges:
yield r[0], -1
yield r[1], 1
def getBoundaries(boundaries):
openrange = 0
for value, boundary in boundaries:
if not openrange:
yield value
openrange += boundary
if not openrange:
yield value
def outputAsRanges(b):
while b:
yield (b.next(), b.next())
return outputAsRanges(getBoundaries(sorted(allBoundaries())))
基本上,我用-1和1来标记边界,然后按值排序,只有在打开和关闭的括号数量平衡为零的时候才输出它们。
2
先对元组进行排序,然后对列表进行排序,如果t1的右边大于等于t2的左边,就合并这两个元组,然后用新的列表重新开始处理……
-->
def f(l, sort = True):
if sort:
sl = sorted(tuple(sorted(i)) for i in l)
else:
sl = l
if len(sl) > 1:
if sl[0][1] >= sl[1][0]:
sl[0] = (sl[0][0], sl[1][1])
del sl[1]
if len(sl) < len(l):
return f(sl, False)
return sl
18
有几种方法可以让代码更高效,更符合Python的风格:
- 去掉
set()
的构造,因为算法应该在主循环中就能自动去掉重复的项。 - 如果你只是想遍历结果,可以使用
yield
来生成值。 - 减少中间对象的创建,比如:把
tuple()
的调用放到生成最终值的地方,这样就不用创建和丢弃多余的元组,同时可以重用一个列表saved
来存储当前的时间范围以便比较。
代码:
def merge(times):
saved = list(times[0])
for st, en in sorted([sorted(t) for t in times]):
if st <= saved[1]:
saved[1] = max(saved[1], en)
else:
yield tuple(saved)
saved[0] = st
saved[1] = en
yield tuple(saved)
data = [
[(1, 5), (2, 4), (3, 6)],
[(1, 3), (2, 4), (5, 8)]
]
for times in data:
print list(merge(times))