确定一个序列是否在Python中是另一个序列的最好方法

这是“更多”任意types的“string contains substring”问题的泛化。

给定一个序列（如列表或元组），确定是否有其他序列是最好的方法是什么？作为奖励，它应该返回子序列开始的元素的索引：

用法示例（序列中的序列）：

>>> seq_in_seq([5,6], [4,'a',3,5,6]) 3 >>> seq_in_seq([5,7], [4,'a',3,5,6]) -1 # or None, or whatever

到目前为止，我只是依靠暴力，看起来很慢，很丑，笨拙。

其次是Knuth-Morris-Prattalgorithm。顺便说一下，你的问题（和KMP解决scheme）在Python Cookbook第二版中完全是配方5.13。你可以在http://code.activestate.com/recipes/117214/find相关的代码;

它找出给定序列中所有正确的子序列，并应该用作迭代器：

 >>> for s in KnuthMorrisPratt([4,'a',3,5,6], [5,6]): print s 3 >>> for s in KnuthMorrisPratt([4,'a',3,5,6], [5,7]): print s (nothing)

这是一个powershell方法O(n*m) （类似于@ mcella的答案）。对于小input序列，它可能比纯Python O(n+m)的Knuth-Morris-Prattalgorithm实现更快（请参阅@Gregg Lind答案）。

 #!/usr/bin/env python def index(subseq, seq): """Return an index of `subseq`uence in the `seq`uence. Or `-1` if `subseq` is not a subsequence of the `seq`. The time complexity of the algorithm is O(n*m), where n, m = len(seq), len(subseq) >>> index([1,2], range(5)) 1 >>> index(range(1, 6), range(5)) -1 >>> index(range(5), range(5)) 0 >>> index([1,2], [0, 1, 0, 1, 2]) 3 """ i, n, m = -1, len(seq), len(subseq) try: while True: i = seq.index(subseq[0], i + 1, n - m + 1) if subseq == seq[i:i + m]: return i except ValueError: return -1 if __name__ == '__main__': import doctest; doctest.testmod()

我不知道在这种情况下有多大？

Knuth-Morris-Prattstring匹配

 >>> def seq_in_seq(subseq, seq): ... while subseq[0] in seq: ... index = seq.index(subseq[0]) ... if subseq == seq[index:index + len(subseq)]: ... return index ... else: ... seq = seq[index + 1:] ... else: ... return -1 ... >>> seq_in_seq([5,6], [4,'a',3,5,6]) 3 >>> seq_in_seq([5,7], [4,'a',3,5,6]) -1

对不起，我不是一个algorithm专家，这只是我脑海里此刻想到的最快速的事情，至less我认为它看起来不错（对我来说），而且我很乐意编码它。 😉

最有可能的是你的蛮力方法正在做同样的事情。

一个简单的方法：转换为string，并依靠string匹配。

使用string列表的示例：

  >>> f = ["foo", "bar", "baz"] >>> g = ["foo", "bar"] >>> ff = str(f).strip("[]") >>> gg = str(g).strip("[]") >>> gg in ff True

使用string元组的示例：

 >>> x = ("foo", "bar", "baz") >>> y = ("bar", "baz") >>> xx = str(x).strip("()") >>> yy = str(y).strip("()") >>> yy in xx True

使用数字列表的示例：

 >>> f = [1 , 2, 3, 4, 5, 6, 7] >>> g = [4, 5, 6] >>> ff = str(f).strip("[]") >>> gg = str(g).strip("[]") >>> gg in ff True

蛮力可能适用于小图案。

对于较大的，请看Aho-Corasickalgorithm 。

这是另一个KMP执行：

 from itertools import tee def seq_in_seq(seq1,seq2): ''' Return the index where seq1 appears in seq2, or -1 if seq1 is not in seq2, using the Knuth-Morris-Pratt algorithm based heavily on code by Neale Pickett <neale@woozle.org> found at: woozle.org/~neale/src/python/kmp.py >>> seq_in_seq(range(3),range(5)) 0 >>> seq_in_seq(range(3)[-1:],range(5)) 2 >>>seq_in_seq(range(6),range(5)) -1 ''' def compute_prefix_function(p): m = len(p) pi = [0] * m k = 0 for q in xrange(1, m): while k > 0 and p[k] != p[q]: k = pi[k - 1] if p[k] == p[q]: k = k + 1 pi[q] = k return pi t,p = list(tee(seq2)[0]), list(tee(seq1)[0]) m,n = len(p),len(t) pi = compute_prefix_function(p) q = 0 for i in range(n): while q > 0 and p[q] != t[i]: q = pi[q - 1] if p[q] == t[i]: q = q + 1 if q == m: return i - m + 1 return -1

我晚了一点，但这里有一些简单的使用string：

 >>> def seq_in_seq(sub, full): ... f = ''.join([repr(d) for d in full]).replace("'", "") ... s = ''.join([repr(d) for d in sub]).replace("'", "") ... #return f.find(s) #<-- not reliable for finding indices in all cases ... return s in f ... >>> seq_in_seq([5,6], [4,'a',3,5,6]) True >>> seq_in_seq([5,7], [4,'a',3,5,6]) False >>> seq_in_seq([4,'abc',33], [4,'abc',33,5,6]) True

正如Ilya V. Schurov指出的，在这种情况下， find方法不会返回多string或多位数字的正确索引。

另一种方法，使用集合：

 set([5,6])== set([5,6])&set([4,'a',3,5,6]) True

确定一个序列是否在Python中是另一个序列的最好方法

正确的方法来检测序列参数？

在Oracle中自动递增而不使用触发器

在Serializable C＃类中使用不带XmlArray的XmlArrayItem属性

如何用C＃中的整数序列创build数组？

更改字典中某个键的名称

我怎样才能得到在Python中的两个列表的连接而不修改任何一个？

置换algorithm没有recursion？ Java的

理解扁平序列的序列？

Postgres手动改变序列

Django Rest框架 – 在序列化器中获取相关的模型字段