最长的子序列

给定一个input序列,find最长(不一定是连续的)非递减子序列的最好方法是什么?

0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15 # sequence 1, 9, 13, 15 # non-decreasing subsequence 0, 2, 6, 9, 13, 15 # longest non-deceasing subsequence (not unique) 

我正在寻找最好的algorithm。 如果有代码,Python会很好,但任何事情都可以。

我偶然发现了这个问题,并提出了这个Python 3的实现:

 def subsequence(seq): if not seq: return seq M = [None] * len(seq) # offset by 1 (j -> j-1) P = [None] * len(seq) # Since we have at least one element in our list, we can start by # knowing that the there's at least an increasing subsequence of length one: # the first element. L = 1 M[0] = 0 # Looping over the sequence starting from the second element for i in range(1, len(seq)): # Binary search: we want the largest j <= L # such that seq[M[j]] < seq[i] (default j = 0), # hence we want the lower bound at the end of the search process. lower = 0 upper = L # Since the binary search will not look at the upper bound value, # we'll have to check that manually if seq[M[upper-1]] < seq[i]: j = upper else: # actual binary search loop while upper - lower > 1: mid = (upper + lower) // 2 if seq[M[mid-1]] < seq[i]: lower = mid else: upper = mid j = lower # this will also set the default value to 0 P[i] = M[j-1] if j == L or seq[i] < seq[M[j]]: M[j] = i L = max(L, j+1) # Building the result: [seq[M[L-1]], seq[P[M[L-1]]], seq[P[P[M[L-1]]]], ...] result = [] pos = M[L-1] for _ in range(L): result.append(seq[pos]) pos = P[pos] return result[::-1] # reversing 

由于我花了一些时间来理解algorithm的工作原理,所以我对它有点冗长的评论,而且我还会添加一个快速解释:

  • seq是input序列。
  • L是一个数字:它在循环遍历序列时得到更新,它标记了到目前为止发现的最长的增长子序列的长度。
  • M是一个列表。 M[j-1]将指向seq一个索引,该索引保存了可以使用的最小值(最后)来构build长度为j的递增子序列。
  • P是一个列表。 P[i]将指向M[j] ,其中iseq的索引。 简而言之,它告诉哪个是子序列的前一个元素。 P用来在最后build立结果。

algorithm如何工作:

  1. 处理空序列的特殊情况。
  2. 从1个元素的子序列开始。
  3. 循环索引为i的input序列。
  4. 用二进制searchfind使seq[M[j] < seq[i] seq[M[j]
  5. 更新PML
  6. 追溯结果并将其返回。

注意:与维基百科algorithm唯一的区别是M列表中的偏移量为1,而X在这里被称为seq 。 我还用Eric Gustavson所做的一个稍微改进的unit testing版来testing它,并通过了所有的testing。


例:

 seq = [30, 10, 20, 50, 40, 80, 60] 0 1 2 3 4 5 6 <-- indexes 

最后我们会有:

 M = [1, 2, 4, 6, None, None, None] P = [None, None, 1, 2, 2, 4, 4] result = [10, 20, 40, 60] 

你会看到P很简单。 我们必须从最后看它,所以它告诉60之前有40,80之前有40 ,在40之前有20 ,在50之前有20 ,在20之前有10 ,停止。

复杂的部分在M 。 由于长度为1的子序列(因此M位置0)的最后一个元素处于索引0:30,所以在开始时M[0, None, None, ...]

在这一点上,我们将开始在seq循环,看看10 ,因为10 < 30M会被更新:

 if j == L or seq[i] < seq[M[j]]: M[j] = i 

所以现在M看起来像: [1, None, None, ...] 。 这是一件好事,因为有更多chanches创造更长的子序列。 (新的1是10的索引)

现在轮到20 。 在1020我们有长度为2的子序列( M索引1),所以M将是: [1, 2, None, ...] 。 (新2是20的指数)

现在轮到5050不会成为任何子序列的一部分,所以没有任何变化。

现在轮到40 。 对于1040我们有一个长度为3的子( M索引2,所以M将是: [1, 2, 4, None, ...] (新的4是40的索引)

等等…

要完整浏览代码,您可以复制并粘贴到这里 🙂

这里是如何简单地findMathematica中最长的递增/递减子序列:

  LIS[list_] := LongestCommonSequence[Sort[list], list]; input={0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15}; LIS[input] -1*LIS[-1*input] 

输出:

 {0, 2, 6, 9, 11, 15} {12, 10, 9, 5, 3} 

Mathematica在Combinatorica库中也有LongestIncreasingSubsequence函数。 如果你没有Mathematica,你可以查询WolframAlpha 。

C ++ O(nlogn)解决scheme

根据一些观察,还有一个O(nlogn)解决scheme。 设Ai,j是长度为j的所有递增子序列中使用元素a 1 ,a 2 ,…,a i的最小可能尾数。 观察到,对于任何特定的我,A 我,1 ,A 我,2 ,…,A 我,J 。 这表明,如果我们想要以ai + 1结尾的最长子序列,我们只需要查找aj,使得Ai,j <ai + 1 <= Ai,j + 1,长度将是j + 1。在这种情况下,Ai + 1,j + 1将等于ai + 1,并且对于k!= j + 1,所有Ai + 1,k将等于Ai,k。 此外,集合Ai和集合Ai + 1之间最多只有一个区别,这是由search引起的。 由于A总是按递增次序排列,并且操作不会改变这个顺序,所以我们可以对每一个a 1 ,a 2 ,…,a n进行二进制search。

实现C ++ (O(nlogn)algorithm)

 #include <vector> using namespace std; /* Finds longest strictly increasing subsequence. O(n log k) algorithm. */ void find_lis(vector<int> &a, vector<int> &b) { vector<int> p(a.size()); int u, v; if (a.empty()) return; b.push_back(0); for (size_t i = 1; i < a.size(); i++) { if (a[b.back()] < a[i]) { p[i] = b.back(); b.push_back(i); continue; } for (u = 0, v = b.size()-1; u < v;) { int c = (u + v) / 2; if (a[b[c]] < a[i]) u=c+1; else v=c; } if (a[i] < a[b[u]]) { if (u > 0) p[i] = b[u-1]; b[u] = i; } } for (u = b.size(), v = b.back(); u--; v = p[v]) b[u] = v; } /* Example of usage: */ #include <cstdio> int main() { int a[] = { 1, 9, 3, 8, 11, 4, 5, 6, 4, 19, 7, 1, 7 }; vector<int> seq(a, a+sizeof(a)/sizeof(a[0])); vector<int> lis; find_lis(seq, lis); for (size_t i = 0; i < lis.size(); i++) printf("%d ", seq[lis[i]]); printf("\n"); return 0; } 

来源: 链接

前段时间我已经将C ++实现重写为Java,并且可以确认它的工作原理。 Python中的vector替代是List。 但是如果你想自己testing一下,这里是链接在线编译器加载的示例实现: 链接

示例数据是: { 1, 9, 3, 8, 11, 4, 5, 6, 4, 19, 7, 1, 7 }和答案: 1 3 4 5 6 7

这里是一些python代码,testing实现了在O(n * log(n))中运行的algorithm。 我在维基百科讨论页上find了关于时间最长的子序列 。

 import unittest def LongestIncreasingSubsequence(X): """ Find and return longest increasing subsequence of S. If multiple increasing subsequences exist, the one that ends with the smallest value is preferred, and if multiple occurrences of that value can end the sequence, then the earliest occurrence is preferred. """ n = len(X) X = [None] + X # Pad sequence so that it starts at X[1] M = [None]*(n+1) # Allocate arrays for M and P P = [None]*(n+1) L = 0 for i in range(1,n+1): if L == 0 or X[M[1]] >= X[i]: # there is no j st X[M[j]] < X[i]] j = 0 else: # binary search for the largest j st X[M[j]] < X[i]] lo = 1 # largest value known to be <= j hi = L+1 # smallest value known to be > j while lo < hi - 1: mid = (lo + hi)//2 if X[M[mid]] < X[i]: lo = mid else: hi = mid j = lo P[i] = M[j] if j == L or X[i] < X[M[j+1]]: M[j+1] = i L = max(L,j+1) # Backtrack to find the optimal sequence in reverse order output = [] pos = M[L] while L > 0: output.append(X[pos]) pos = P[pos] L -= 1 output.reverse() return output # Try small lists and check that the correct subsequences are generated. class LISTest(unittest.TestCase): def testLIS(self): self.assertEqual(LongestIncreasingSubsequence([]),[]) self.assertEqual(LongestIncreasingSubsequence(range(10,0,-1)),[1]) self.assertEqual(LongestIncreasingSubsequence(range(10)),range(10)) self.assertEqual(LongestIncreasingSubsequence(\ [3,1,4,1,5,9,2,6,5,3,5,8,9,7,9]), [1,2,3,5,8,9]) unittest.main() 
  int[] a = {1,3,2,4,5,4,6,7}; StringBuilder s1 = new StringBuilder(); for(int i : a){ s1.append(i); } StringBuilder s2 = new StringBuilder(); int count = findSubstring(s1.toString(), s2); System.out.println(s2.reverse()); public static int findSubstring(String str1, StringBuilder s2){ StringBuilder s1 = new StringBuilder(str1); if(s1.length() == 0){ return 0; } if(s2.length() == 0){ s2.append(s1.charAt(s1.length()-1)); findSubstring(s1.deleteCharAt(s1.length()-1).toString(), s2); } else if(s1.charAt(s1.length()-1) < s2.charAt(s2.length()-1)){ char c = s1.charAt(s1.length()-1); return 1 + findSubstring(s1.deleteCharAt(s1.length()-1).toString(), s2.append(c)); } else{ char c = s1.charAt(s1.length()-1); StringBuilder s3 = new StringBuilder(); for(int i=0; i < s2.length(); i++){ if(s2.charAt(i) > c){ s3.append(s2.charAt(i)); } } s3.append(c); return Math.max(findSubstring(s1.deleteCharAt(s1.length()-1).toString(), s2), findSubstring(s1.deleteCharAt(s1.length()-1).toString(), s3)); } return 0; } 

这是一个非常普遍的解决scheme:

  • 运行在O(n log n)时间,
  • 处理递增,不递减,递减和不递增的子序列,
  • 与任何序列对象,包括listnumpy.arraystr和更多,
  • 支持对象列表和自定义比较方法,通过与内置sorted函数类似的key参数,
  • 可以返回子序列的元素或其索引。

代码:

 from bisect import bisect_left, bisect_right from functools import cmp_to_key def longest_subsequence(seq, mode='strictly', order='increasing', key=None, index=False): bisect = bisect_left if mode.startswith('strict') else bisect_right # compute keys for comparison just once rank = seq if key is None else map(key, seq) if order == 'decreasing': rank = map(cmp_to_key(lambda x,y: 1 if x<y else 0 if x==y else -1), rank) rank = list(rank) if not rank: return [] lastoflength = [0] # end position of subsequence with given length predecessor = [None] # penultimate element of lis ending at given position for i in range(1, len(seq)): # seq[i] can extend a subsequence that ends with a lesser (or equal) element j = bisect([rank[k] for k in lastoflength], rank[i]) # update existing subsequence of length j or extend the longest try: lastoflength[j] = i except: lastoflength.append(i) # remember element before seq[i] in the subsequence predecessor.append(lastoflength[j-1] if j > 0 else None) # trace indices [p^n(i), ..., p(p(i)), p(i), i], where n=len(lastoflength)-1 def trace(i): if i is not None: yield from trace(predecessor[i]) yield i indices = trace(lastoflength[-1]) return list(indices) if index else [seq[i] for i in indices] 

我写了一个文档string的function,我没有粘贴上面,以显示代码:

 """ Return the longest increasing subsequence of `seq`. Parameters ---------- seq : sequence object Can be any sequence, like `str`, `list`, `numpy.array`. mode : {'strict', 'strictly', 'weak', 'weakly'}, optional If set to 'strict', the subsequence will contain unique elements. Using 'weak' an element can be repeated many times. Modes ending in -ly serve as a convenience to use with `order` parameter, because `longest_sequence(seq, 'weakly', 'increasing')` reads better. The default is 'strict'. order : {'increasing', 'decreasing'}, optional By default return the longest increasing subsequence, but it is possible to return the longest decreasing sequence as well. key : function, optional Specifies a function of one argument that is used to extract a comparison key from each list element (eg, `str.lower`, `lambda x: x[0]`). The default value is `None` (compare the elements directly). index : bool, optional If set to `True`, return the indices of the subsequence, otherwise return the elements. Default is `False`. Returns ------- elements : list, optional A `list` of elements of the longest subsequence. Returned by default and when `index` is set to `False`. indices : list, optional A `list` of indices pointing to elements in the longest subsequence. Returned when `index` is set to `True`. """ 

一些例子:

 >>> seq = [0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15] >>> longest_subsequence(seq) [0, 2, 6, 9, 11, 15] >>> longest_subsequence(seq, order='decreasing') [12, 10, 9, 5, 3] >>> txt = ("Given an input sequence, what is the best way to find the longest" " (not necessarily continuous) non-decreasing subsequence.") >>> ''.join(longest_subsequence(txt)) ' ,abdegilnorsu' >>> ''.join(longest_subsequence(txt, 'weak')) ' ceilnnnnrsssu' >>> ''.join(longest_subsequence(txt, 'weakly', 'decreasing')) 'vuutttttttssronnnnngeee.' >>> dates = [ ... ('2015-02-03', 'name1'), ... ('2015-02-04', 'nameg'), ... ('2015-02-04', 'name5'), ... ('2015-02-05', 'nameh'), ... ('1929-03-12', 'name4'), ... ('2023-07-01', 'name7'), ... ('2015-02-07', 'name0'), ... ('2015-02-08', 'nameh'), ... ('2015-02-15', 'namex'), ... ('2015-02-09', 'namew'), ... ('1980-12-23', 'name2'), ... ('2015-02-12', 'namen'), ... ('2015-02-13', 'named'), ... ] >>> longest_subsequence(dates, 'weak') [('2015-02-03', 'name1'), ('2015-02-04', 'name5'), ('2015-02-05', 'nameh'), ('2015-02-07', 'name0'), ('2015-02-08', 'nameh'), ('2015-02-09', 'namew'), ('2015-02-12', 'namen'), ('2015-02-13', 'named')] >>> from operator import itemgetter >>> longest_subsequence(dates, 'weak', key=itemgetter(0)) [('2015-02-03', 'name1'), ('2015-02-04', 'nameg'), ('2015-02-04', 'name5'), ('2015-02-05', 'nameh'), ('2015-02-07', 'name0'), ('2015-02-08', 'nameh'), ('2015-02-09', 'namew'), ('2015-02-12', 'namen'), ('2015-02-13', 'named')] >>> indices = set(longest_subsequence(dates, key=itemgetter(0), index=True)) >>> [e for i,e in enumerate(dates) if i not in indices] [('2015-02-04', 'nameg'), ('1929-03-12', 'name4'), ('2023-07-01', 'name7'), ('2015-02-15', 'namex'), ('1980-12-23', 'name2')] 

这个答案部分地受到了Code Review上的问题的启发,并且部分地由问题提出了“无序”值 。

这里最有效的algorithm是O(NlogN)。

解决这个问题的另一种方法是采用原始数组中最长的公共子序列 (LCS),它是sorting的版本,它需要O(N 2 )个时间。

这里是与Java的代码和解释,可能是我将很快添加python。

 arr = {0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15} 
  1. list = {0} – 将列表初始化为空集
  2. 列表= {0,8} – 新的最大的LIS
  3. list = {0,4} – 将8更改为4
  4. 列表= {0,4,12}​​ – 新的最大的LIS
  5. list = {0,2,12} – 将4更改为2
  6. list = {0,2,10} – 将12更改为10
  7. list = {0,2,6} – 将10更改为6
  8. 列表= {0,2,6,14} – 新的最大的LIS
  9. list = {0,1,6,14} – 将2更改为1
  10. list = {0,1,6,9} – 将14改为9
  11. list = {0,1,5,9} – 将6更改为5
  12. list = {0,1,6,9,13} – 将3更改为2
  13. 列表= {0,1,3,9,11} – 新的最大的LIS
  14. list = {0,1,3,9,11} – 将9更改为5
  15. 列表= {0,1,3,7,11} – 新的最大的LIS
  16. list = {0,1,3,7,11,15} – 新的最大的LIS

所以LIS的长度是6(列表的大小)。

 import java.util.ArrayList; import java.util.List; public class LongestIncreasingSubsequence { public static void main(String[] args) { int[] arr = { 0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15 }; increasingSubsequenceValues(arr); } public static void increasingSubsequenceValues(int[] seq) { List<Integer> list = new ArrayList<Integer>(); for (int i = 0; i < seq.length; i++) { int j = 0; boolean elementUpdate = false; for (; j < list.size(); j++) { if (list.get(j) > seq[i]) { list.add(j, seq[i]); list.remove(j + 1); elementUpdate = true; break; } } if (!elementUpdate) { list.add(j, seq[i]); } } System.out.println("Longest Increasing Subsequence" + list); } } 

上述代码的输出:最长增加子序列[0,1,3,7,11,15]

这里是一个紧凑的实现使用“枚举”

 def lis(l): # we will create a list of lists where each sub-list contains # the longest increasing subsequence ending at this index lis = [[e] for e in l] # start with just the elements of l as contents of the sub-lists # iterate over (index,value) of l for i, e in enumerate(l): # (index,value) tuples for elements b where b<e and a<i lower_tuples = filter(lambda (a,b): b<e, enumerate(l[:i])) # if no such items, nothing to do if not lower_tuples: continue # keep the lis-es of such items lowerlises = [lis[a] for a,b in lower_tuples ] # choose the longest one of those and add # to the current element's lis lis[i] = max(lowerlises, key=len) + [e] # retrun the longest of lis-es return max(lis, key=len) 

下面是一个更紧凑但仍然有效的Python实现:

 def longest_increasing_subsequence_indices(seq): from bisect import bisect_right if len(seq) == 0: return seq # m[j] in iteration i is the last index of the increasing subsequence of seq[:i] # that ends with the lowest possible value while having length j m = [None] * len(seq) predecessor = [None] * len(seq) best_len = 0 for i, item in enumerate(seq): j = bisect_right([seq[k] for k in m[:best_len]], item) m[j] = i predecessor[i] = m[j-1] if j > 0 else None best_len = max(best_len, j+1) result = [] i = m[best_len-1] while i is not None: result.append(i) i = predecessor[i] result.reverse() return result def longest_increasing_subsequence(seq): return [seq[i] for i in longest_increasing_subsequence_indices(seq)]