C＃，List <T> .Contains（） – 太慢？

任何人都可以解释为什么generics列表的Contains（）函数如此之慢？
我有一个包含大约一百万个数字的列表，以及不断检查这些数字中是否有特定数字的代码。
我尝试使用Dictionary和ContainsKey（）函数做同样的事情，它比列表快大约10-20倍。
当然，我真的不想用Dictionary来达到这个目的，因为它不是用来这样用的。
所以，这里真正的问题是，有什么替代List.Contains（），但不是像Dictionary.ContainsKey（）whacky？
提前致谢！

如果你只是检查存在，在.NET 3.5中的HashSet<T>是你最好的select – 类似字典的性能，但没有键/值对 – 只是值：

  HashSet<int> data = new HashSet<int>(); for (int i = 0; i < 1000000; i++) { data.Add(rand.Next(50000000)); } bool contains = data.Contains(1234567); // etc

List.Contains是一个O（n）操作。

Dictionary.ContainsKey是一个O（1）操作，因为它使用对象的哈希码作为关键字，从而为您提供更快的search能力。

我不认为有一个包含一百万个条目的列表是一个好主意。我不认为List类是为此devise的。 🙂

是不是可以将这些毫无保留的实体保存到RDBMS中，并对该数据库执行查询？

如果这是不可能的，那么我会使用一个词典无论如何。

我想我有答案！是的，列表（数组）上的Contains（）是O（n），但是如果数组很短并且使用的是值types，它仍然应该是相当快的。但是使用CLR Profiler [从微软免费下载]，我发现Contains（）是为了比较它们，这需要堆分配，这是非常昂贵（慢）拳击值。 [注意：这是.Net 2.0; 其他.Net版本未经testing。]

这是完整的故事和解决scheme。我们有一个名为“VI”的枚举，并创build了一个名为“ValueIdList”的类，它是VI对象的列表（数组）的抽象types。原来的实现是在.net 1.1天的古代，它使用了一个封装的ArrayList。我们最近在http://blogs.msdn.com/b/joshwil/archive/2004/04/13/112598.aspx发现一个通用列表（List <VI>）在值types上的performance要比ArrayList好得多枚举VI），因为值不必被装箱。这是真的，它的工作…几乎。

CLR Profiler发现了一个惊喜。以下是分配图的一部分：

ValueIdList ::包含bool（VI）5.5MB（34.81％）
Generic.List :: Contains bool（<UNKNOWN>）5.5MB（34.81％）
Generic.ObjectEqualityComparer <T> :: Equals bool（<UNKNOWN> <UNKNOWN>）5.5MB（34.88％）
值.VI 7.7MB（49.03％）

正如你所看到的，Contains（）令人惊讶地调用了Generic.ObjectEqualityComparer.Equals（），这显然需要装箱一个VI值，这需要昂贵的堆分配。微软会消除名单上的拳击，而这只是为了像这样简单的操作。

我们的解决scheme是重写Contains（）实现，在我们的例子中很容易实现，因为我们已经封装了通用列表对象（_items）。这是简单的代码：

 public bool Contains(VI id) { return IndexOf(id) >= 0; } public int IndexOf(VI id) { int i, count; count = _items.Count; for (i = 0; i < count; i++) if (_items[i] == id) return i; return -1; } public bool Remove(VI id) { int i; i = IndexOf(id); if (i < 0) return false; _items.RemoveAt(i); return true; }

VI值的比较现在正在我们自己的IndexOf（）版本中完成，它不需要装箱，而且速度非常快。在这个简单的重写之后，我们特定的程序加速了20％。 O（N）…没问题！只要避免浪费的内存使用！

字典不是那么糟糕，因为字典中的键被devise得很快。要在列表中查找数字，需要遍历整个列表。

当然，字典只适用于你的号码是唯一的，而不是命令。

我想在.NET 3.5中也有一个HashSet<T>类，它也只允许唯一的元素。

SortedList将更快地search（但是插入项目的速度更慢）

这不完全是你的问题的答案，但我有一个类，提高集合上的Contains（）的性能。我subclassed一个队列，并添加一个字典映射到对象列表的哈希码。 Dictionary.Contains()函数是O（1），而List.Contains() ， Queue.Contains()和Stack.Contains()是O（n）。

字典的值types是一个拥有相同哈希码的对象的队列。调用者可以提供实现IEqualityComparer的自定义类对象。您可以使用这种模式的堆栈或列表。代码只需要一些更改。

 /// <summary> /// This is a class that mimics a queue, except the Contains() operation is O(1) rather than O(n) thanks to an internal dictionary. /// The dictionary remembers the hashcodes of the items that have been enqueued and dequeued. /// Hashcode collisions are stored in a queue to maintain FIFO order. /// </summary> /// <typeparam name="T"></typeparam> private class HashQueue<T> : Queue<T> { private readonly IEqualityComparer<T> _comp; public readonly Dictionary<int, Queue<T>> _hashes; //_hashes.Count doesn't always equal base.Count (due to collisions) public HashQueue(IEqualityComparer<T> comp = null) : base() { this._comp = comp; this._hashes = new Dictionary<int, Queue<T>>(); } public HashQueue(int capacity, IEqualityComparer<T> comp = null) : base(capacity) { this._comp = comp; this._hashes = new Dictionary<int, Queue<T>>(capacity); } public HashQueue(IEnumerable<T> collection, IEqualityComparer<T> comp = null) : base(collection) { this._comp = comp; this._hashes = new Dictionary<int, Queue<T>>(base.Count); foreach (var item in collection) { this.EnqueueDictionary(item); } } public new void Enqueue(T item) { base.Enqueue(item); //add to queue this.EnqueueDictionary(item); } private void EnqueueDictionary(T item) { int hash = this._comp == null ? item.GetHashCode() : this._comp.GetHashCode(item); Queue<T> temp; if (!this._hashes.TryGetValue(hash, out temp)) { temp = new Queue<T>(); this._hashes.Add(hash, temp); } temp.Enqueue(item); } public new T Dequeue() { T result = base.Dequeue(); //remove from queue int hash = this._comp == null ? result.GetHashCode() : this._comp.GetHashCode(result); Queue<T> temp; if (this._hashes.TryGetValue(hash, out temp)) { temp.Dequeue(); if (temp.Count == 0) this._hashes.Remove(hash); } return result; } public new bool Contains(T item) { //This is O(1), whereas Queue.Contains is (n) int hash = this._comp == null ? item.GetHashCode() : this._comp.GetHashCode(item); return this._hashes.ContainsKey(hash); } public new void Clear() { foreach (var item in this._hashes.Values) item.Clear(); //clear collision lists this._hashes.Clear(); //clear dictionary base.Clear(); //clear queue } }

我的简单testing显示，我的HashQueue.Contains()运行速度比Queue.Contains()快得多。运行计数设置为10,000的testing代码，HashQueue版本需要0.00045秒，Queue版本需要0.37秒。计数为100,000，HashQueue版本需要0.0031秒，而队列需要36.38秒！

这是我的testing代码：

 static void Main(string[] args) { int count = 10000; { //HashQueue var q = new HashQueue<int>(count); for (int i = 0; i < count; i++) //load queue (not timed) q.Enqueue(i); System.Diagnostics.Stopwatch sw = System.Diagnostics.Stopwatch.StartNew(); for (int i = 0; i < count; i++) { bool contains = q.Contains(i); } sw.Stop(); Console.WriteLine(string.Format("HashQueue, {0}", sw.Elapsed)); } { //Queue var q = new Queue<int>(count); for (int i = 0; i < count; i++) //load queue (not timed) q.Enqueue(i); System.Diagnostics.Stopwatch sw = System.Diagnostics.Stopwatch.StartNew(); for (int i = 0; i < count; i++) { bool contains = q.Contains(i); } sw.Stop(); Console.WriteLine(string.Format("Queue, {0}", sw.Elapsed)); } Console.ReadLine(); }

为什么字典不合适？

要查看列表中是否有特定的值，您需要遍历整个列表。使用字典（或其他基于散列的容器）可以更快地缩小需要比较的对象数量。关键字（在你的情况下，数字）被散列，并给字典的比较对象的分数子集。

我在没有支持HashSet的精简框架中使用这个，我select了一个字典，其中两个string都是我正在寻找的值。

这意味着我得到与字典性能列表<>function。这有点哈克，但它的作品。

C＃，List <T> .Contains（） – 太慢？

如何创build.NET软件的试用版本？

在WebRequest中强制执行基本身份validation

xperf WinDBG C＃.NET 4.5.2应用程序 – 了解进程转储

如何从HashSet <T>检索实际项目？

asynchronousTask.WhenAll超时

只将唯一项目添加到列表

ReaderWriterLock vs lock {}

为什么我的.NET 4应用程序知道.NET 4没有安装

将WPF应用程序中的XAML控件复制到类库后出现的问题

如何在FtpWebRequest之前检查FTP上是否存在文件