最长递增子序列

35 投票

13 回答

43094 浏览

提问于 2025-04-16 05:53

给定一个输入序列，怎么找到最长的（不一定是连续的）递增子序列，方法是什么呢？

[0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15]  # input

[1, 9, 13, 15]  # an example of an increasing subsequence (not the longest)

[0, 2, 6, 9, 13, 15]  # longest increasing subsequence (not a unique answer)
[0, 2, 6, 9, 11, 15]  # another possible solution

我在寻找最好的算法。如果有代码的话，Python语言最好，但其他语言也可以。

数据结构算法优化动态规划最长递增子序列

13 个回答

这里有一个比较通用的解决方案，它：

运行时间为 O(n log n)，也就是说处理速度比较快，适合大数据量。
可以处理递增、非递增、递减和非递减的子序列。
适用于任何类型的序列对象，包括 list（列表）、numpy.array（数组）、str（字符串）等等。
支持对象列表和自定义比较方法，可以通过 key 参数来实现，这个参数的用法和内置的 sorted 函数一样。
可以返回子序列的元素或它们的索引。

代码如下：

from bisect import bisect_left, bisect_right
from functools import cmp_to_key

def longest_subsequence(seq, mode='strictly', order='increasing',
                        key=None, index=False):

  bisect = bisect_left if mode.startswith('strict') else bisect_right

  # compute keys for comparison just once
  rank = seq if key is None else map(key, seq)
  if order == 'decreasing':
    rank = map(cmp_to_key(lambda x,y: 1 if x<y else 0 if x==y else -1), rank)
  rank = list(rank)

  if not rank: return []

  lastoflength = [0] # end position of subsequence with given length
  predecessor = [None] # penultimate element of l.i.s. ending at given position

  for i in range(1, len(seq)):
    # seq[i] can extend a subsequence that ends with a lesser (or equal) element
    j = bisect([rank[k] for k in lastoflength], rank[i])
    # update existing subsequence of length j or extend the longest
    try: lastoflength[j] = i
    except: lastoflength.append(i)
    # remember element before seq[i] in the subsequence
    predecessor.append(lastoflength[j-1] if j > 0 else None)

  # trace indices [p^n(i), ..., p(p(i)), p(i), i], where n=len(lastoflength)-1
  def trace(i):
    if i is not None:
      yield from trace(predecessor[i])
      yield i
  indices = trace(lastoflength[-1])

  return list(indices) if index else [seq[i] for i in indices]

我为这个函数写了一个文档字符串，但没有粘贴在上面，是为了展示代码：

"""
Return the longest increasing subsequence of `seq`.

Parameters
----------
seq : sequence object
  Can be any sequence, like `str`, `list`, `numpy.array`.
mode : {'strict', 'strictly', 'weak', 'weakly'}, optional
  If set to 'strict', the subsequence will contain unique elements.
  Using 'weak' an element can be repeated many times.
  Modes ending in -ly serve as a convenience to use with `order` parameter,
  because `longest_sequence(seq, 'weakly', 'increasing')` reads better.
  The default is 'strict'.
order : {'increasing', 'decreasing'}, optional
  By default return the longest increasing subsequence, but it is possible
  to return the longest decreasing sequence as well.
key : function, optional
  Specifies a function of one argument that is used to extract a comparison
  key from each list element (e.g., `str.lower`, `lambda x: x[0]`).
  The default value is `None` (compare the elements directly).
index : bool, optional
  If set to `True`, return the indices of the subsequence, otherwise return
  the elements. Default is `False`.

Returns
-------
elements : list, optional
  A `list` of elements of the longest subsequence.
  Returned by default and when `index` is set to `False`.
indices : list, optional
  A `list` of indices pointing to elements in the longest subsequence.
  Returned when `index` is set to `True`.
"""

一些示例：

>>> seq = [0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15]

>>> longest_subsequence(seq)
[0, 2, 6, 9, 11, 15]

>>> longest_subsequence(seq, order='decreasing')
[12, 10, 9, 5, 3]

>>> txt = ("Given an input sequence, what is the best way to find the longest"
               " (not necessarily continuous) non-decreasing subsequence.")

>>> ''.join(longest_subsequence(txt))
' ,abdegilnorsu'

>>> ''.join(longest_subsequence(txt, 'weak'))
'              ceilnnnnrsssu'

>>> ''.join(longest_subsequence(txt, 'weakly', 'decreasing'))
'vuutttttttssronnnnngeee.'

>>> dates = [
...   ('2015-02-03', 'name1'),
...   ('2015-02-04', 'nameg'),
...   ('2015-02-04', 'name5'),
...   ('2015-02-05', 'nameh'),
...   ('1929-03-12', 'name4'),
...   ('2023-07-01', 'name7'),
...   ('2015-02-07', 'name0'),
...   ('2015-02-08', 'nameh'),
...   ('2015-02-15', 'namex'),
...   ('2015-02-09', 'namew'),
...   ('1980-12-23', 'name2'),
...   ('2015-02-12', 'namen'),
...   ('2015-02-13', 'named'),
... ]

>>> longest_subsequence(dates, 'weak')

[('2015-02-03', 'name1'),
 ('2015-02-04', 'name5'),
 ('2015-02-05', 'nameh'),
 ('2015-02-07', 'name0'),
 ('2015-02-08', 'nameh'),
 ('2015-02-09', 'namew'),
 ('2015-02-12', 'namen'),
 ('2015-02-13', 'named')]

>>> from operator import itemgetter

>>> longest_subsequence(dates, 'weak', key=itemgetter(0))

[('2015-02-03', 'name1'),
 ('2015-02-04', 'nameg'),
 ('2015-02-04', 'name5'),
 ('2015-02-05', 'nameh'),
 ('2015-02-07', 'name0'),
 ('2015-02-08', 'nameh'),
 ('2015-02-09', 'namew'),
 ('2015-02-12', 'namen'),
 ('2015-02-13', 'named')]

>>> indices = set(longest_subsequence(dates, key=itemgetter(0), index=True))

>>> [e for i,e in enumerate(dates) if i not in indices]

[('2015-02-04', 'nameg'),
 ('1929-03-12', 'name4'),
 ('2023-07-01', 'name7'),
 ('2015-02-15', 'namex'),
 ('1980-12-23', 'name2')]

这个答案部分受到 Code Review 上问题的启发，部分受到关于“序列外”值的问题的启发。

回答于 2025-04-16 由 Python大师

分享举报

下面是如何在Mathematica中简单找到最长的递增或递减子序列：

 LIS[list_] := LongestCommonSequence[Sort[list], list];
 input={0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15};
 LIS[input]
 -1*LIS[-1*input]

输出结果：

{0, 2, 6, 9, 11, 15}
{12, 10, 9, 5, 3}

Mathematica还有一个叫LongestIncreasingSubsequence的函数，属于Combinatorica`库。如果你没有Mathematica，可以去WolframAlpha查询。

C++ O(nlogn)解决方案

还有一种基于一些观察的O(nlogn)解决方案。设Ai,j为使用元素a₁, a₂, ... , a_i构成的所有递增子序列中，长度为j的子序列的最小尾部。注意，对于任何特定的i，A_i,1, A_i,2, ... , A_i,j。这表明，如果我们想要以ai + 1结尾的最长子序列，我们只需要寻找一个j，使得Ai,j < ai + 1 <= Ai,j + 1，这样长度就是j + 1。注意，在这种情况下，Ai + 1,j + 1将等于ai + 1，而所有Ai + 1,k将等于Ai,k，前提是k!=j+1。此外，集合Ai和集合Ai + 1之间最多只有一个差异，这是由这个搜索造成的。由于A始终是按递增顺序排列的，而这个操作不会改变这种顺序，我们可以对每一个a₁, a₂, ... , a_n进行二分搜索。

实现 C++ (O(nlogn)算法)
#include <vector>
using namespace std;

/* Finds longest strictly increasing subsequence. O(n log k) algorithm. */
void find_lis(vector<int> &a, vector<int> &b)
{
  vector<int> p(a.size());
  int u, v;

  if (a.empty()) return;

  b.push_back(0);

  for (size_t i = 1; i < a.size(); i++) {
      if (a[b.back()] < a[i]) {
          p[i] = b.back();
          b.push_back(i);
          continue;
      }

      for (u = 0, v = b.size()-1; u < v;) {
          int c = (u + v) / 2;
          if (a[b[c]] < a[i]) u=c+1; else v=c;
      }

      if (a[i] < a[b[u]]) {
          if (u > 0) p[i] = b[u-1];
          b[u] = i;
      }   
  }

  for (u = b.size(), v = b.back(); u--; v = p[v]) b[u] = v;
}

/* Example of usage: */
#include <cstdio>
int main()
{
  int a[] = { 1, 9, 3, 8, 11, 4, 5, 6, 4, 19, 7, 1, 7 };
  vector<int> seq(a, a+sizeof(a)/sizeof(a[0]));
  vector<int> lis;
        find_lis(seq, lis);

  for (size_t i = 0; i < lis.size(); i++)
      printf("%d ", seq[lis[i]]);
        printf("\n");    

  return 0;
}

来源：链接

我之前把这个C++实现改写成了Java，并且可以确认它是有效的。在Python中，Vector的替代品是List。如果你想自己测试，这里有一个在线编译器的链接，里面加载了示例实现：链接

示例数据是：{ 1, 9, 3, 8, 11, 4, 5, 6, 4, 19, 7, 1, 7 }，答案是：1 3 4 5 6 7。

回答于 2025-04-16 由 Python大师

分享举报

我刚遇到这个问题，写了一个Python 3的实现：

def subsequence(seq):
    if not seq:
        return seq

    M = [None] * len(seq)    # offset by 1 (j -> j-1)
    P = [None] * len(seq)

    # Since we have at least one element in our list, we can start by 
    # knowing that the there's at least an increasing subsequence of length one:
    # the first element.
    L = 1
    M[0] = 0

    # Looping over the sequence starting from the second element
    for i in range(1, len(seq)):
        # Binary search: we want the largest j <= L
        #  such that seq[M[j]] < seq[i] (default j = 0),
        #  hence we want the lower bound at the end of the search process.
        lower = 0
        upper = L

        # Since the binary search will not look at the upper bound value,
        # we'll have to check that manually
        if seq[M[upper-1]] < seq[i]:
            j = upper

        else:
            # actual binary search loop
            while upper - lower > 1:
                mid = (upper + lower) // 2
                if seq[M[mid-1]] < seq[i]:
                    lower = mid
                else:
                    upper = mid

            j = lower    # this will also set the default value to 0

        P[i] = M[j-1]

        if j == L or seq[i] < seq[M[j]]:
            M[j] = i
            L = max(L, j+1)

    # Building the result: [seq[M[L-1]], seq[P[M[L-1]]], seq[P[P[M[L-1]]]], ...]
    result = []
    pos = M[L-1]
    for _ in range(L):
        result.append(seq[pos])
        pos = P[pos]

    return result[::-1]    # reversing

因为我花了一些时间理解这个算法是怎么工作的，所以我在注释上写得比较详细，下面我也会简单解释一下：

seq 是输入的序列。
L 是一个数字：在遍历序列时会不断更新，它表示到目前为止找到的最长递增子序列的长度。
M 是一个列表。M[j-1] 会指向 seq 中一个索引，这个索引的值是可以用来构建长度为 j 的递增子序列的最小值。
P 也是一个列表。P[i] 会指向 M[j]，其中 i 是 seq 的索引。简单来说，它告诉我们子序列的前一个元素是什么。P 用来在最后构建结果。

算法的工作原理：

处理空序列的特殊情况。
从一个元素的子序列开始。
用索引 i 遍历输入序列。
通过二分查找找到 j，使得 seq[M[j] 小于 seq[i]。
更新 P、M 和 L。
回溯结果并返回反转后的结果。

注意：与维基百科算法的唯一不同是 M 列表的偏移量为1，以及这里的 X 被称为 seq。我还用稍微改进的单元测试版本进行了测试，和 Eric Gustavson 的回答中的测试版本相比，它通过了所有测试。

示例：

seq = [30, 10, 20, 50, 40, 80, 60]

       0    1   2   3   4   5   6   <-- indexes

最后我们会得到：

M = [1, 2, 4, 6, None, None, None]
P = [None, None, 1, 2, 2, 4, 4]
result = [10, 20, 40, 60]

正如你所看到的，P 是相当简单的。我们需要从最后开始看，所以它告诉我们在 60 之前是 40，在 80 之前是 40，在 40 之前是 20，在 50 之前是 20，在 20 之前是 10，然后停止。

复杂的部分在于 M。一开始 M 是 [0, None, None, ...]，因为长度为1的子序列的最后一个元素（因此在 M 中的位置为0）是在索引0的 30。

此时我们开始遍历 seq，看 10，因为 10 小于 30，所以 M 会被更新：

if j == L or seq[i] < seq[M[j]]:
    M[j] = i

所以现在 M 看起来是：[1, None, None, ...]。这是一件好事，因为 10 有更大的机会形成更长的递增子序列。（新的1是10的索引）

现在轮到 20。有了 10 和 20，我们得到了长度为2的子序列（在 M 中的索引为1），所以 M 会变成：[1, 2, None, ...]。（新的2是20的索引）

接下来是 50。50 不会成为任何子序列的一部分，所以没有变化。

现在轮到 40。有了 10、20 和 40，我们得到了长度为3的子序列（在 M 中的索引为2），所以 M 会变成：[1, 2, 4, None, ...]。（新的4是40的索引）

依此类推……

如果你想完整了解代码，可以在这里复制粘贴 :)

回答于 2025-04-16 由 Python大师

分享举报

最长递增子序列

13 个回答

C++ O(nlogn)解决方案

实现 C++ (O(nlogn)算法)

撰写回答