检查平面列表中的重复项

例1

['one', 'two', 'one']应该返回True 。

例2

['one', 'two', 'three']应该返回False 。

如果所有值都是可散列的，则使用set()删除重复项：

 >>> your_list = ['one', 'two', 'one'] >>> len(your_list) != len(set(your_list)) True

仅推荐用于短名单：

 any(thelist.count(x) > 1 for x in thelist)

不要使用长列表 – 它可能需要时间与列表中的项目数量的平方成比例！

对于具有可哈希项目（string，数字和＆c）的较长列表：

 def anydup(thelist): seen = set() for x in thelist: if x in seen: return True seen.add(x) return False

如果你的物品不可散列（子列表，字典等），它会变得更毛发，但如果它们至less可以比较，仍然可能得到O（N logN）。但是您需要知道或testing项目的特性（可否或不可，可比或不可），以获得最佳的性能 – O（N）为可信，O（N log N）为非可比可比，否则这是下降到O（N平方），没有人能做到这一点:-(。

这是旧的，但这里的答案让我有一个稍微不同的解决scheme。如果你正在滥用理解，你可以通过这种方式进行短路。

 xs = [1, 2, 1] s = set() any(x in s or s.add(x) for x in xs) # You can use a similar approach to actually retrieve the duplicates. s = set() duplicates = set(x for x in xs if x in s or s.add(x))

如果您喜欢函数式编程风格，那么这里是一个有用的函数，使用doctest进行自我logging和testing的代码。

 def decompose(a_list): """Turns a list into a set of all elements and a set of duplicated elements. Returns a pair of sets. The first one contains elements that are found at least once in the list. The second one contains elements that appear more than once. >>> decompose([1,2,3,5,3,2,6]) (set([1, 2, 3, 5, 6]), set([2, 3])) """ return reduce( lambda (u, d), o : (u.union([o]), d.union(u.intersection([o]))), a_list, (set(), set())) if __name__ == "__main__": import doctest doctest.testmod()

从那里你可以通过检查返回对的第二个元素是否为空来testingunicity：

 def is_set(l): """Test if there is no duplicate element in l. >>> is_set([1,2,3]) True >>> is_set([1,2,1]) False >>> is_set([]) True """ return not decompose(l)[1]

请注意，这是不高效的，因为你明确地构造分解。但是沿着使用reduce的路线，你可以find相当的答案（但效率稍低）来回答5：

 def is_set(l): try: def func(s, o): if o in s: raise Exception return s.union([o]) reduce(func, l, set()) return True except: return False

我最近回答了一个相关的问题，使用一个生成器来build立列表中的所有重复项。它的优点是，如果只是用来build立“如果有重复”，那么你只需要得到第一个项目，其余的可以被忽略，这是最终的捷径。

这是一个有趣的基于集合的方法，我直接从moooeeeep改编：

 def getDupes(l): seen = set() seen_add = seen.add for x in l: if x in seen or seen_add(x): yield x

因此，完整的列表将被list(getDupes(etc)) 。要简单地testing“如果”有一个愚蠢的，它应该包装如下：

 def hasDupes(l): try: if getDupes(c).next(): return True # Found a dupe except StopIteration: pass return False

这个规模很好，并提供一致的操作时间，无论是在列表中的愚蠢 – 我testing了多达1米的条目列表。如果你知道关于数据的一些事情，特别是那些愚蠢的事情可能会在上半年出现，或者其他的事情让你歪曲你的需求，比如需要得到真正的愚蠢，那么有一些真正替代的愚蠢的定位器那可能会跑赢大盘我推荐的两个是…

简单的基于字典的方法，非常可读：

 def getDupes(c): d = {} for i in c: if i in d: if d[i]: yield i d[i] = False else: d[i] = True

利用itertools（本质上是一个ifilter / izip / tee）sorting列表，非常有效的，如果你得到的所有愚蠢，但不是很快得到第一：

 def getDupes(c): a, b = itertools.tee(sorted(c)) next(b, None) r = None for k, g in itertools.ifilter(lambda x: x[0]==x[1], itertools.izip(a, b)): if k != r: yield k r = k

这些人是我为全副手表所尝试的方法中performance最好的，从头到尾，第一个重复出现在1米的元素列表中。令人惊讶的是，在分拣步骤中增加了很less的开销。你的里程可能会有所不同，但这里是我的具体计时结果：

 Finding FIRST duplicate, single dupe places "n" elements in to 1m element array Test set len change : 50 - . . . . . -- 0.002 Test in dict : 50 - . . . . . -- 0.002 Test in set : 50 - . . . . . -- 0.002 Test sort/adjacent : 50 - . . . . . -- 0.023 Test sort/groupby : 50 - . . . . . -- 0.026 Test sort/zip : 50 - . . . . . -- 1.102 Test sort/izip : 50 - . . . . . -- 0.035 Test sort/tee/izip : 50 - . . . . . -- 0.024 Test moooeeeep : 50 - . . . . . -- 0.001 * Test iter*/sorted : 50 - . . . . . -- 0.027 Test set len change : 5000 - . . . . . -- 0.017 Test in dict : 5000 - . . . . . -- 0.003 * Test in set : 5000 - . . . . . -- 0.004 Test sort/adjacent : 5000 - . . . . . -- 0.031 Test sort/groupby : 5000 - . . . . . -- 0.035 Test sort/zip : 5000 - . . . . . -- 1.080 Test sort/izip : 5000 - . . . . . -- 0.043 Test sort/tee/izip : 5000 - . . . . . -- 0.031 Test moooeeeep : 5000 - . . . . . -- 0.003 * Test iter*/sorted : 5000 - . . . . . -- 0.031 Test set len change : 50000 - . . . . . -- 0.035 Test in dict : 50000 - . . . . . -- 0.023 Test in set : 50000 - . . . . . -- 0.023 Test sort/adjacent : 50000 - . . . . . -- 0.036 Test sort/groupby : 50000 - . . . . . -- 0.134 Test sort/zip : 50000 - . . . . . -- 1.121 Test sort/izip : 50000 - . . . . . -- 0.054 Test sort/tee/izip : 50000 - . . . . . -- 0.045 Test moooeeeep : 50000 - . . . . . -- 0.019 * Test iter*/sorted : 50000 - . . . . . -- 0.055 Test set len change : 500000 - . . . . . -- 0.249 Test in dict : 500000 - . . . . . -- 0.145 Test in set : 500000 - . . . . . -- 0.165 Test sort/adjacent : 500000 - . . . . . -- 0.139 Test sort/groupby : 500000 - . . . . . -- 1.138 Test sort/zip : 500000 - . . . . . -- 1.159 Test sort/izip : 500000 - . . . . . -- 0.126 Test sort/tee/izip : 500000 - . . . . . -- 0.120 * Test moooeeeep : 500000 - . . . . . -- 0.131 Test iter*/sorted : 500000 - . . . . . -- 0.157

检查平面列表中的重复项

例1

例2

将另一个列表sorting

R中的数据框和列表有什么区别？

Pythonic的方式来确定是否不为空列表条目是“连续的”

检查string是否以列表中的某个string结尾

番石榴：为什么没有Lists.filter（）函数？

div里面的列表是否允许？

在C＃中排列对象列表

在JavaScript中进行分区

HashSet <T>和List <T>有什么区别？

Python中的recursion基础