最长递增子序列
给定一个输入序列,怎么找到最长的(不一定是连续的)递增子序列,方法是什么呢?
[0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15] # input
[1, 9, 13, 15] # an example of an increasing subsequence (not the longest)
[0, 2, 6, 9, 13, 15] # longest increasing subsequence (not a unique answer)
[0, 2, 6, 9, 11, 15] # another possible solution
我在寻找最好的算法。如果有代码的话,Python语言最好,但其他语言也可以。
13 个回答
这里有一个比较通用的解决方案,它:
- 运行时间为
O(n log n)
,也就是说处理速度比较快,适合大数据量。 - 可以处理递增、非递增、递减和非递减的子序列。
- 适用于任何类型的序列对象,包括
list
(列表)、numpy.array
(数组)、str
(字符串)等等。 - 支持对象列表和自定义比较方法,可以通过
key
参数来实现,这个参数的用法和内置的sorted
函数一样。 - 可以返回子序列的元素或它们的索引。
代码如下:
from bisect import bisect_left, bisect_right
from functools import cmp_to_key
def longest_subsequence(seq, mode='strictly', order='increasing',
key=None, index=False):
bisect = bisect_left if mode.startswith('strict') else bisect_right
# compute keys for comparison just once
rank = seq if key is None else map(key, seq)
if order == 'decreasing':
rank = map(cmp_to_key(lambda x,y: 1 if x<y else 0 if x==y else -1), rank)
rank = list(rank)
if not rank: return []
lastoflength = [0] # end position of subsequence with given length
predecessor = [None] # penultimate element of l.i.s. ending at given position
for i in range(1, len(seq)):
# seq[i] can extend a subsequence that ends with a lesser (or equal) element
j = bisect([rank[k] for k in lastoflength], rank[i])
# update existing subsequence of length j or extend the longest
try: lastoflength[j] = i
except: lastoflength.append(i)
# remember element before seq[i] in the subsequence
predecessor.append(lastoflength[j-1] if j > 0 else None)
# trace indices [p^n(i), ..., p(p(i)), p(i), i], where n=len(lastoflength)-1
def trace(i):
if i is not None:
yield from trace(predecessor[i])
yield i
indices = trace(lastoflength[-1])
return list(indices) if index else [seq[i] for i in indices]
我为这个函数写了一个文档字符串,但没有粘贴在上面,是为了展示代码:
"""
Return the longest increasing subsequence of `seq`.
Parameters
----------
seq : sequence object
Can be any sequence, like `str`, `list`, `numpy.array`.
mode : {'strict', 'strictly', 'weak', 'weakly'}, optional
If set to 'strict', the subsequence will contain unique elements.
Using 'weak' an element can be repeated many times.
Modes ending in -ly serve as a convenience to use with `order` parameter,
because `longest_sequence(seq, 'weakly', 'increasing')` reads better.
The default is 'strict'.
order : {'increasing', 'decreasing'}, optional
By default return the longest increasing subsequence, but it is possible
to return the longest decreasing sequence as well.
key : function, optional
Specifies a function of one argument that is used to extract a comparison
key from each list element (e.g., `str.lower`, `lambda x: x[0]`).
The default value is `None` (compare the elements directly).
index : bool, optional
If set to `True`, return the indices of the subsequence, otherwise return
the elements. Default is `False`.
Returns
-------
elements : list, optional
A `list` of elements of the longest subsequence.
Returned by default and when `index` is set to `False`.
indices : list, optional
A `list` of indices pointing to elements in the longest subsequence.
Returned when `index` is set to `True`.
"""
一些示例:
>>> seq = [0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15]
>>> longest_subsequence(seq)
[0, 2, 6, 9, 11, 15]
>>> longest_subsequence(seq, order='decreasing')
[12, 10, 9, 5, 3]
>>> txt = ("Given an input sequence, what is the best way to find the longest"
" (not necessarily continuous) non-decreasing subsequence.")
>>> ''.join(longest_subsequence(txt))
' ,abdegilnorsu'
>>> ''.join(longest_subsequence(txt, 'weak'))
' ceilnnnnrsssu'
>>> ''.join(longest_subsequence(txt, 'weakly', 'decreasing'))
'vuutttttttssronnnnngeee.'
>>> dates = [
... ('2015-02-03', 'name1'),
... ('2015-02-04', 'nameg'),
... ('2015-02-04', 'name5'),
... ('2015-02-05', 'nameh'),
... ('1929-03-12', 'name4'),
... ('2023-07-01', 'name7'),
... ('2015-02-07', 'name0'),
... ('2015-02-08', 'nameh'),
... ('2015-02-15', 'namex'),
... ('2015-02-09', 'namew'),
... ('1980-12-23', 'name2'),
... ('2015-02-12', 'namen'),
... ('2015-02-13', 'named'),
... ]
>>> longest_subsequence(dates, 'weak')
[('2015-02-03', 'name1'),
('2015-02-04', 'name5'),
('2015-02-05', 'nameh'),
('2015-02-07', 'name0'),
('2015-02-08', 'nameh'),
('2015-02-09', 'namew'),
('2015-02-12', 'namen'),
('2015-02-13', 'named')]
>>> from operator import itemgetter
>>> longest_subsequence(dates, 'weak', key=itemgetter(0))
[('2015-02-03', 'name1'),
('2015-02-04', 'nameg'),
('2015-02-04', 'name5'),
('2015-02-05', 'nameh'),
('2015-02-07', 'name0'),
('2015-02-08', 'nameh'),
('2015-02-09', 'namew'),
('2015-02-12', 'namen'),
('2015-02-13', 'named')]
>>> indices = set(longest_subsequence(dates, key=itemgetter(0), index=True))
>>> [e for i,e in enumerate(dates) if i not in indices]
[('2015-02-04', 'nameg'),
('1929-03-12', 'name4'),
('2023-07-01', 'name7'),
('2015-02-15', 'namex'),
('1980-12-23', 'name2')]
这个答案部分受到 Code Review 上问题的启发,部分受到 关于“序列外”值的问题的启发。
下面是如何在Mathematica中简单找到最长的递增或递减子序列:
LIS[list_] := LongestCommonSequence[Sort[list], list];
input={0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15};
LIS[input]
-1*LIS[-1*input]
输出结果:
{0, 2, 6, 9, 11, 15}
{12, 10, 9, 5, 3}
Mathematica还有一个叫LongestIncreasingSubsequence的函数,属于Combinatorica`库。如果你没有Mathematica,可以去WolframAlpha查询。
C++ O(nlogn)解决方案
还有一种基于一些观察的O(nlogn)解决方案。设Ai,j为使用元素a1, a2, ... , ai构成的所有递增子序列中,长度为j的子序列的最小尾部。注意,对于任何特定的i,Ai,1, Ai,2, ... , Ai,j。这表明,如果我们想要以ai + 1结尾的最长子序列,我们只需要寻找一个j,使得Ai,j < ai + 1 <= Ai,j + 1,这样长度就是j + 1。注意,在这种情况下,Ai + 1,j + 1将等于ai + 1,而所有Ai + 1,k将等于Ai,k,前提是k!=j+1。此外,集合Ai和集合Ai + 1之间最多只有一个差异,这是由这个搜索造成的。由于A始终是按递增顺序排列的,而这个操作不会改变这种顺序,我们可以对每一个a1, a2, ... , an进行二分搜索。
实现 C++ (O(nlogn)算法)
#include <vector> using namespace std; /* Finds longest strictly increasing subsequence. O(n log k) algorithm. */ void find_lis(vector<int> &a, vector<int> &b) { vector<int> p(a.size()); int u, v; if (a.empty()) return; b.push_back(0); for (size_t i = 1; i < a.size(); i++) { if (a[b.back()] < a[i]) { p[i] = b.back(); b.push_back(i); continue; } for (u = 0, v = b.size()-1; u < v;) { int c = (u + v) / 2; if (a[b[c]] < a[i]) u=c+1; else v=c; } if (a[i] < a[b[u]]) { if (u > 0) p[i] = b[u-1]; b[u] = i; } } for (u = b.size(), v = b.back(); u--; v = p[v]) b[u] = v; } /* Example of usage: */ #include <cstdio> int main() { int a[] = { 1, 9, 3, 8, 11, 4, 5, 6, 4, 19, 7, 1, 7 }; vector<int> seq(a, a+sizeof(a)/sizeof(a[0])); vector<int> lis; find_lis(seq, lis); for (size_t i = 0; i < lis.size(); i++) printf("%d ", seq[lis[i]]); printf("\n"); return 0; }
来源:链接
我之前把这个C++实现改写成了Java,并且可以确认它是有效的。在Python中,Vector的替代品是List。如果你想自己测试,这里有一个在线编译器的链接,里面加载了示例实现:链接
示例数据是:{ 1, 9, 3, 8, 11, 4, 5, 6, 4, 19, 7, 1, 7 }
,答案是:1 3 4 5 6 7
。
我刚遇到这个问题,写了一个Python 3的实现:
def subsequence(seq):
if not seq:
return seq
M = [None] * len(seq) # offset by 1 (j -> j-1)
P = [None] * len(seq)
# Since we have at least one element in our list, we can start by
# knowing that the there's at least an increasing subsequence of length one:
# the first element.
L = 1
M[0] = 0
# Looping over the sequence starting from the second element
for i in range(1, len(seq)):
# Binary search: we want the largest j <= L
# such that seq[M[j]] < seq[i] (default j = 0),
# hence we want the lower bound at the end of the search process.
lower = 0
upper = L
# Since the binary search will not look at the upper bound value,
# we'll have to check that manually
if seq[M[upper-1]] < seq[i]:
j = upper
else:
# actual binary search loop
while upper - lower > 1:
mid = (upper + lower) // 2
if seq[M[mid-1]] < seq[i]:
lower = mid
else:
upper = mid
j = lower # this will also set the default value to 0
P[i] = M[j-1]
if j == L or seq[i] < seq[M[j]]:
M[j] = i
L = max(L, j+1)
# Building the result: [seq[M[L-1]], seq[P[M[L-1]]], seq[P[P[M[L-1]]]], ...]
result = []
pos = M[L-1]
for _ in range(L):
result.append(seq[pos])
pos = P[pos]
return result[::-1] # reversing
因为我花了一些时间理解这个算法是怎么工作的,所以我在注释上写得比较详细,下面我也会简单解释一下:
seq
是输入的序列。L
是一个数字:在遍历序列时会不断更新,它表示到目前为止找到的最长递增子序列的长度。M
是一个列表。M[j-1]
会指向seq
中一个索引,这个索引的值是可以用来构建长度为j
的递增子序列的最小值。P
也是一个列表。P[i]
会指向M[j]
,其中i
是seq
的索引。简单来说,它告诉我们子序列的前一个元素是什么。P
用来在最后构建结果。
算法的工作原理:
- 处理空序列的特殊情况。
- 从一个元素的子序列开始。
- 用索引
i
遍历输入序列。 - 通过二分查找找到
j
,使得seq[M[j]
小于seq[i]
。 - 更新
P
、M
和L
。 - 回溯结果并返回反转后的结果。
注意:与 维基百科算法 的唯一不同是 M
列表的偏移量为1,以及这里的 X
被称为 seq
。我还用稍微改进的单元测试版本进行了测试,和 Eric Gustavson 的回答 中的测试版本相比,它通过了所有测试。
示例:
seq = [30, 10, 20, 50, 40, 80, 60]
0 1 2 3 4 5 6 <-- indexes
最后我们会得到:
M = [1, 2, 4, 6, None, None, None]
P = [None, None, 1, 2, 2, 4, 4]
result = [10, 20, 40, 60]
正如你所看到的,P
是相当简单的。我们需要从最后开始看,所以它告诉我们在 60
之前是 40
,在 80
之前是 40
,在 40
之前是 20
,在 50
之前是 20
,在 20
之前是 10
,然后停止。
复杂的部分在于 M
。一开始 M
是 [0, None, None, ...]
,因为长度为1的子序列的最后一个元素(因此在 M
中的位置为0)是在索引0的 30
。
此时我们开始遍历 seq
,看 10
,因为 10
小于 30
,所以 M
会被更新:
if j == L or seq[i] < seq[M[j]]:
M[j] = i
所以现在 M
看起来是:[1, None, None, ...]
。这是一件好事,因为 10
有更大的机会形成更长的递增子序列。(新的1是10的索引)
现在轮到 20
。有了 10
和 20
,我们得到了长度为2的子序列(在 M
中的索引为1),所以 M
会变成:[1, 2, None, ...]
。(新的2是20的索引)
接下来是 50
。50
不会成为任何子序列的一部分,所以没有变化。
现在轮到 40
。有了 10
、20
和 40
,我们得到了长度为3的子序列(在 M
中的索引为2),所以 M
会变成:[1, 2, 4, None, ...]
。(新的4是40的索引)
依此类推……
如果你想完整了解代码,可以在 这里 复制粘贴 :)