在C#中从List <T>中selectN个随机元素

我需要一个快速algorithm从通用列表中select5个随机元素。 例如,我想从List<string>获得5个随机元素。

迭代遍历和为每个元素使select的概率=(需要的数量)/(数字剩下)

所以如果你有40个项目,第一个将有5/40的机会被选中。 如果是,下一个有4/39的机会,否则有5/39的机会。 当你到达最后,你会得到你的5件物品,而且在这之前,你通常会拥有所有的物品。

使用linq:

 YourList.OrderBy(x => rnd.Next()).Take(5) 

这实际上是一个比听起来更难的问题,主要是因为许多math上正确的解决scheme将无法真正让你击中所有的可能性(下面更多)。

首先,这里有一些易于实现,正确的,如果你有一个真正的随机数发生器:

(0)凯尔的答案是O(n)。

(1)生成n对[(0,rand),(1,rand),(2,rand),…]的列表,按第二个坐标对它们进行sorting,并使用第一个k(对于你,k = 5)索引来获得你的随机子集。 我认为这很容易实现,尽pipe它是O(n log n)时间。

(2)初始化一个空的列表s = [],它将成长为k个随机元素的索引。 随机select{0,1,2,…,n-1}中的一个数字r,r = rand%n,并将其添加到s中。 接下来取r = rand%(n-1)并坚持s; 为了避免冲突,在s中添加小于#的元素。 接下来取r = rand%(n-2),并做同样的事情,等等,直到你在s中有k个不同的元素。 这是最糟糕的运行时间O(k ^ 2)。 所以对于k << n,这可以更快。 如果你保持sorting并跟踪它所具有的连续时间间隔,你可以在O(k log k)中实现它,但是它更多的工作。

@凯尔 – 你是对的,我想第二想到你的答案。 我一开始就匆匆读了一遍,错误地认为你是在指示按固定的概率k / n顺序select每个元素,这是错误的 – 但是你的适应性方法对我来说是正确的。 对于那个很抱歉。

好吧,现在踢球者:渐近(固定k,n增长),有n ^ k / k! selectn个元素中的k个元素子集[这是(nselectk)的近似值]。 如果n很大,k不是很小,那么这些数字是巨大的。 在任何标准的32位随机数发生器中,你所希望的最佳周期长度是2 ^ 32 = 256 ^ 4。 所以,如果我们有一个1000个元素的列表,并且我们想随机select5个,那么标准的随机数生成器将无法达到所有的可能性。 但是,只要你可以select适用于较小集合的选项,并且总是“看起来”随机,那么这些algorithm应该是可以的。

附录 :写完这个之后,我意识到正确执行想法(2)是非常棘手的,所以我想澄清这个答案。 要获得O(k log k)时间,需要一个支持O(log m)search和插入的类似数组的结构 – 平衡二叉树可以完成此操作。 使用这样的结构来构build一个名为s的数组,下面是一些伪代码:

 # Returns a container s with k distinct random numbers from {0, 1, ..., n-1} def ChooseRandomSubset(n, k): for i in range(k): r = UniformRandom(0, ni) # May be 0, must be < ni q = s.FirstIndexSuchThat( s[q] - q > r ) # This is the search. s.InsertInOrder(q ? r + q : r + len(s)) # Inserts right before q. return s 

我build议通过几个例子来看看如何有效地实现上面的英文解释。

 public static List<T> GetRandomElements<T>(this IEnumerable<T> list, int elementsCount) { return list.OrderBy(arg => Guid.NewGuid()).Take(elementsCount).ToList(); } 

我认为所选的答案是正确的,非常可爱。 我虽然以不同的方式实现了它,但我也是以随机顺序的结果。

  static IEnumerable<SomeType> PickSomeInRandomOrder<SomeType>( IEnumerable<SomeType> someTypes, int maxCount) { Random random = new Random(DateTime.Now.Millisecond); Dictionary<double, SomeType> randomSortTable = new Dictionary<double,SomeType>(); foreach(SomeType someType in someTypes) randomSortTable[random.NextDouble()] = someType; return randomSortTable.OrderBy(KVP => KVP.Key).Take(maxCount).Select(KVP => KVP.Value); } 

从algorithm中的龙,在C#中的解释:

 int k = 10; // items to select var items = new List<int>(new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 }); var selected = new List<int>(); var needed = k; var available = items.Count; var rand = new Random(); while (selected.Count < k) { if( rand.NextDouble() < needed / available ) { selected.Add(items[available-1]) needed--; } available--; } 

该algorithm将select项目列表的唯一标记。

我只是遇到了这个问题,还有一些谷歌search带来了随机洗牌的问题: http : //en.wikipedia.org/wiki/Fisher-Yates_shuffle

要完全随机洗牌你的名单(就地),你这样做:

洗牌n个元素(索引0..n-1)的数组:

  for i from n − 1 downto 1 do j ← random integer with 0 ≤ j ≤ i exchange a[j] and a[i] 

如果你只需要前5个元素,那么你只需要将它运行到n-5(即:n-5),而不是从n-1运行到1,

比方说你需要k项,

这变成:

  for (i = n − 1; i >= nk; i--) { j = random integer with 0 ≤ j ≤ i exchange a[j] and a[i] } 

每个被select的项目都被交换到数组的末尾,所以select的k个元素是数组的最后k个元素。

这需要时间O(K),其中K是您需要随机select的元素的数量。

此外,如果您不想修改初始列表,则可以将所有交换logging在临时列表中,反转该列表并重新应用它们,从而执行反向交换集并将您的初始列表返回而不更改O(k)运行时间。

最后,对于真正的stickler,如果(n == k),你应该停在1,而不是nk,随机select的整数总是0。

你可以使用这个,但sorting会发生在客户端

  .AsEnumerable().OrderBy(n => Guid.NewGuid()).Take(5); 

正在考虑@JohnShedletsky关于接受的答案 (释义)的评论:

你应该能够在O(subset.Length)中,而不是O(originalList.Length)

基本上,你应该能够生成subset随机索引,然后从原始列表中摘取它们。

方法

 public static class EnumerableExtensions { public static Random randomizer = new Random(); // you'd ideally be able to replace this with whatever makes you comfortable public static IEnumerable<T> GetRandom<T>(this IEnumerable<T> list, int numItems) { return (list as T[] ?? list.ToArray()).GetRandom(numItems); // because ReSharper whined about duplicate enumeration... /* items.Add(list.ElementAt(randomizer.Next(list.Count()))) ) numItems--; */ } // just because the parentheses were getting confusing public static IEnumerable<T> GetRandom<T>(this T[] list, int numItems) { var items = new HashSet<T>(); // don't want to add the same item twice; otherwise use a list while (numItems > 0 ) // if we successfully added it, move on if( items.Add(list[randomizer.Next(list.Length)]) ) numItems--; return items; } // and because it's really fun; note -- you may get repetition public static IEnumerable<T> PluckRandomly<T>(this IEnumerable<T> list) { while( true ) yield return list.ElementAt(randomizer.Next(list.Count())); } } 

如果你想更有效率,你可能会使用索引HashSet ,而不是实际的列表元素(如果你有复杂的types或昂贵的比较);

unit testing

并确保我们没有任何碰撞等

 [TestClass] public class RandomizingTests : UnitTestBase { [TestMethod] public void GetRandomFromList() { this.testGetRandomFromList((list, num) => list.GetRandom(num)); } [TestMethod] public void PluckRandomly() { this.testGetRandomFromList((list, num) => list.PluckRandomly().Take(num), requireDistinct:false); } private void testGetRandomFromList(Func<IEnumerable<int>, int, IEnumerable<int>> methodToGetRandomItems, int numToTake = 10, int repetitions = 100000, bool requireDistinct = true) { var items = Enumerable.Range(0, 100); IEnumerable<int> randomItems = null; while( repetitions-- > 0 ) { randomItems = methodToGetRandomItems(items, numToTake); Assert.AreEqual(numToTake, randomItems.Count(), "Did not get expected number of items {0}; failed at {1} repetition--", numToTake, repetitions); if(requireDistinct) Assert.AreEqual(numToTake, randomItems.Distinct().Count(), "Collisions (non-unique values) found, failed at {0} repetition--", repetitions); Assert.IsTrue(randomItems.All(o => items.Contains(o)), "Some unknown values found; failed at {0} repetition--", repetitions); } } } 

我使用的简单解决scheme(可能不适用于大型列表):将列表复制到临时列表中,然后在循环中随机从临时列表中select项目,并将其放入选定项目列表中,同时将其移除到临时列表中(因此它不能重新select)。

例:

 List<Object> temp = OriginalList.ToList(); List<Object> selectedItems = new List<Object>(); Random rnd = new Random(); Object o; int i = 0; while (i < NumberOfSelectedItems) { o = temp[rnd.Next(temp.Count)]; selectedItems.Add(o); temp.Remove(o); i++; } 

从一个组中selectN个随机项目不应该与订单有任何关系! 随机性是关于不可预测性的,而不是关于在一个团体中洗牌的。 处理某种有序的所有答案肯定会比那些不那么有效。 由于效率是关键,所以我会发布一些不会改变项目顺序的东西。

1)如果你需要真正的随机值,这意味着没有限制哪些元素可供select(即一旦select的项目可以重新select):

 public static List<T> GetTrueRandom<T>(this IList<T> source, int count, bool throwArgumentOutOfRangeException = true) { if (throwArgumentOutOfRangeException && count > source.Count) throw new ArgumentOutOfRangeException(); var randoms = new List<T>(count); randoms.AddRandomly(source, count); return randoms; } 

如果您设置了exception标志,那么您可以select任意次数的随机项目。

如果你有{1,2,3,4},那么它可以为{3,4},{1,4,3},{1,4,3}等等,甚至{1,4,3,2,4} 5件物品!

这应该是相当快,因为​​它没有任何检查。

2)如果你需要团队中的个别成员而没有重复,那么我会依靠一本字典(正如已经指出的那样)。

 public static List<T> GetDistinctRandom<T>(this IList<T> source, int count) { if (count > source.Count) throw new ArgumentOutOfRangeException(); if (count == source.Count) return new List<T>(source); var sourceDict = source.ToIndexedDictionary(); if (count > source.Count / 2) { while (sourceDict.Count > count) sourceDict.Remove(source.GetRandomIndex()); return sourceDict.Select(kvp => kvp.Value).ToList(); } var randomDict = new Dictionary<int, T>(count); while (randomDict.Count < count) { int key = source.GetRandomIndex(); if (!randomDict.ContainsKey(key)) randomDict.Add(key, sourceDict[key]); } return randomDict.Select(kvp => kvp.Value).ToList(); } 

代码比这里的其他字典方法有点长,因为我不仅添加,而且从列表中删除,所以它有两个循环。 你可以在这里看到,当count等于source.Count时,我根本没有重新sorting 。 那是因为我相信随机性应该在整个返回集合 。 我的意思是如果你想从1, 2, 3, 4, 5 5个随机项目中select1, 2, 3, 4, 51, 2, 3, 4, 5都不重要,但是如果你需要的话在同一组中有4个项目,那么在1, 2, 3, 4等等中应该产生不可预料的产量。其次, 当要返回的随机项目的数量是超过一半的原始组,然后更容易删除源source.Count - count从组中 count项目比添加count项目。 出于性能方面的原因,我使用source来代替sourceDict ,然后在remove方法中获取随机索引。

所以如果你有{1,2,3,4},最后可以在{1,2,3},{3,4,1}等3个项目。

3)如果考虑到原始组中的副本,您需要从组中获得真正独特的随机值,那么可以使用与上述相同的方法,但是HashSet将比字典更轻。

 public static List<T> GetTrueDistinctRandom<T>(this IList<T> source, int count, bool throwArgumentOutOfRangeException = true) { if (count > source.Count) throw new ArgumentOutOfRangeException(); var set = new HashSet<T>(source); if (throwArgumentOutOfRangeException && count > set.Count) throw new ArgumentOutOfRangeException(); List<T> list = hash.ToList(); if (count >= set.Count) return list; if (count > set.Count / 2) { while (set.Count > count) set.Remove(list.GetRandom()); return set.ToList(); } var randoms = new HashSet<T>(); randoms.AddRandomly(list, count); return randoms.ToList(); } 

randomsvariables被设置为HashSet以避免在Random.Next可以产生相同值的情况下,特别是在input列表很小的情况下,最稀有的情况下添加重复项。

所以{1,2,2,4} => 3个随机项目=> {1,2,4}而从不{1,2,2}

{1,2,2,4} => 4个随机项目=>exception! 或{1,2,4},具体取决于标志集。

我使用的一些扩展方法:

 static Random rnd = new Random(); public static int GetRandomIndex<T>(this ICollection<T> source) { return rnd.Next(source.Count); } public static T GetRandom<T>(this IList<T> source) { return source[source.GetRandomIndex()]; } static void AddRandomly<T>(this ICollection<T> toCol, IList<T> fromList, int count) { while (toCol.Count < count) toCol.Add(fromList.GetRandom()); } public static Dictionary<int, T> ToIndexedDictionary<T>(this IEnumerable<T> lst) { return lst.ToIndexedDictionary(t => t); } public static Dictionary<int, T> ToIndexedDictionary<S, T>(this IEnumerable<S> lst, Func<S, T> valueSelector) { int index = -1; return lst.ToDictionary(t => ++index, valueSelector); } 

如果它的performance必须迭代10000次,那么你可能希望有比System.Random 更快的随机类 ,但是我认为考虑后者最可能是没有什么大不了的从来不是瓶颈,它相当快。

编辑:如果你需要重新安排退货的顺序,那么没有什么可以打败达哈伊姆的费希尔 – 耶茨的方法 – 简短,甜美,简单。

基于凯尔的答案,这是我的C#实现。

 /// <summary> /// Picks random selection of available game ID's /// </summary> private static List<int> GetRandomGameIDs(int count) { var gameIDs = (int[])HttpContext.Current.Application["NonDeletedArcadeGameIDs"]; var totalGameIDs = gameIDs.Count(); if (count > totalGameIDs) count = totalGameIDs; var rnd = new Random(); var leftToPick = count; var itemsLeft = totalGameIDs; var arrPickIndex = 0; var returnIDs = new List<int>(); while (leftToPick > 0) { if (rnd.Next(0, itemsLeft) < leftToPick) { returnIDs .Add(gameIDs[arrPickIndex]); leftToPick--; } arrPickIndex++; itemsLeft--; } return returnIDs ; } 

这个方法可能相当于凯尔的。

说你的清单大小为n,你想要k个元素。

 Random rand = new Random(); for(int i = 0; k>0; ++i) { int r = rand.Next(0, ni); if(r<k) { //include element i k--; } } 

奇迹般有效 :)

– 亚历克斯·吉尔伯特

我结合了上面的几个答案来创build一个Lazily评估的扩展方法。 我的testing表明,Kyle的方法(Order(N))比drzaus使用一组来提出随机指数(Order(K))要慢很多倍。 前者对随机数发生器执行更多的调用,并且在物品上迭代更多次。

我执行的目标是:

1)如果给定一个不是IList的IEnumerable,则不要意识到完整列表。 如果我给了一个十亿个项目的序列,我不想耗尽内存。 使用Kyle的方法来进行在线解决scheme。

2)如果我能说出来是一个IList,那就用drzaus的方法吧。 如果K超过N的一半,我就会冒着打击的风险,因为我一次又一次地select许多随机指数,不得不跳过它们。 因此,我编写了一个不保留的索引列表。

3)我保证这些物品将按照遇到的顺序返回。 凯尔的algorithm不需要改变。 drzaus'algorithm要求我不按照select随机索引的顺序排列项目。 我将所有索引收集到一个SortedSet中,然后以sorting后的索引顺序排列项目。

4)如果K比N大,我反转了集合的意义,那么我列举所有项目,并testing索引是否不在集合中。 这意味着我失去了Order(K)运行时间,但是由于在这些情况下K接近于N,所以我不会损失太多。

这里是代码:

  /// <summary> /// Takes k elements from the next n elements at random, preserving their order. /// /// If there are fewer than n elements in items, this may return fewer than k elements. /// </summary> /// <typeparam name="TElem">Type of element in the items collection.</typeparam> /// <param name="items">Items to be randomly selected.</param> /// <param name="k">Number of items to pick.</param> /// <param name="n">Total number of items to choose from. /// If the items collection contains more than this number, the extra members will be skipped. /// If the items collection contains fewer than this number, it is possible that fewer than k items will be returned.</param> /// <returns>Enumerable over the retained items. /// /// See http://stackoverflow.com/questions/48087/select-a-random-n-elements-from-listt-in-c-sharp for the commentary. /// </returns> public static IEnumerable<TElem> TakeRandom<TElem>(this IEnumerable<TElem> items, int k, int n) { var r = new FastRandom(); var itemsList = items as IList<TElem>; if (k >= n || (itemsList != null && k >= itemsList.Count)) foreach (var item in items) yield return item; else { // If we have a list, we can infer more information and choose a better algorithm. // When using an IList, this is about 7 times faster (on one benchmark)! if (itemsList != null && k < n/2) { // Since we have a List, we can use an algorithm suitable for Lists. // If there are fewer than n elements, reduce n. n = Math.Min(n, itemsList.Count); // This algorithm picks K index-values randomly and directly chooses those items to be selected. // If k is more than half of n, then we will spend a fair amount of time thrashing, picking // indices that we have already picked and having to try again. var invertSet = k >= n/2; var positions = invertSet ? (ISet<int>) new HashSet<int>() : (ISet<int>) new SortedSet<int>(); var numbersNeeded = invertSet ? n - k : k; while (numbersNeeded > 0) if (positions.Add(r.Next(0, n))) numbersNeeded--; if (invertSet) { // positions contains all the indices of elements to Skip. for (var itemIndex = 0; itemIndex < n; itemIndex++) { if (!positions.Contains(itemIndex)) yield return itemsList[itemIndex]; } } else { // positions contains all the indices of elements to Take. foreach (var itemIndex in positions) yield return itemsList[itemIndex]; } } else { // Since we do not have a list, we will use an online algorithm. // This permits is to skip the rest as soon as we have enough items. var found = 0; var scanned = 0; foreach (var item in items) { var rand = r.Next(0,n-scanned); if (rand < k - found) { yield return item; found++; } scanned++; if (found >= k || scanned >= n) break; } } } } 

我使用了一个专门的随机数生成器,但是如果你愿意的话,你可以使用C#的Random 。 ( FastRandom是由Colin Green编写的,是SharpNEAT的一部分,它的周期为2 ^ 128-1,优于许多RNG。)

这里是unit testing:

 [TestClass] public class TakeRandomTests { /// <summary> /// Ensure that when randomly choosing items from an array, all items are chosen with roughly equal probability. /// </summary> [TestMethod] public void TakeRandom_Array_Uniformity() { const int numTrials = 2000000; const int expectedCount = numTrials/20; var timesChosen = new int[100]; var century = new int[100]; for (var i = 0; i < century.Length; i++) century[i] = i; for (var trial = 0; trial < numTrials; trial++) { foreach (var i in century.TakeRandom(5, 100)) timesChosen[i]++; } var avg = timesChosen.Average(); var max = timesChosen.Max(); var min = timesChosen.Min(); var allowedDifference = expectedCount/100; AssertBetween(avg, expectedCount - 2, expectedCount + 2, "Average"); //AssertBetween(min, expectedCount - allowedDifference, expectedCount, "Min"); //AssertBetween(max, expectedCount, expectedCount + allowedDifference, "Max"); var countInRange = timesChosen.Count(i => i >= expectedCount - allowedDifference && i <= expectedCount + allowedDifference); Assert.IsTrue(countInRange >= 90, String.Format("Not enough were in range: {0}", countInRange)); } /// <summary> /// Ensure that when randomly choosing items from an IEnumerable that is not an IList, /// all items are chosen with roughly equal probability. /// </summary> [TestMethod] public void TakeRandom_IEnumerable_Uniformity() { const int numTrials = 2000000; const int expectedCount = numTrials / 20; var timesChosen = new int[100]; for (var trial = 0; trial < numTrials; trial++) { foreach (var i in Range(0,100).TakeRandom(5, 100)) timesChosen[i]++; } var avg = timesChosen.Average(); var max = timesChosen.Max(); var min = timesChosen.Min(); var allowedDifference = expectedCount / 100; var countInRange = timesChosen.Count(i => i >= expectedCount - allowedDifference && i <= expectedCount + allowedDifference); Assert.IsTrue(countInRange >= 90, String.Format("Not enough were in range: {0}", countInRange)); } private IEnumerable<int> Range(int low, int count) { for (var i = low; i < low + count; i++) yield return i; } private static void AssertBetween(int x, int low, int high, String message) { Assert.IsTrue(x > low, String.Format("Value {0} is less than lower limit of {1}. {2}", x, low, message)); Assert.IsTrue(x < high, String.Format("Value {0} is more than upper limit of {1}. {2}", x, high, message)); } private static void AssertBetween(double x, double low, double high, String message) { Assert.IsTrue(x > low, String.Format("Value {0} is less than lower limit of {1}. {2}", x, low, message)); Assert.IsTrue(x < high, String.Format("Value {0} is more than upper limit of {1}. {2}", x, high, message)); } } 

这是我可以在第一次切割时得到的最好的:

 public List<String> getRandomItemsFromList(int returnCount, List<String> list) { List<String> returnList = new List<String>(); Dictionary<int, int> randoms = new Dictionary<int, int>(); while (randoms.Count != returnCount) { //generate new random between one and total list count int randomInt = new Random().Next(list.Count); // store this in dictionary to ensure uniqueness try { randoms.Add(randomInt, randomInt); } catch (ArgumentException aex) { Console.Write(aex.Message); } //we can assume this element exists in the dictonary already //check for randoms length and then iterate through the original list //adding items we select via random to the return list if (randoms.Count == returnCount) { foreach (int key in randoms.Keys) returnList.Add(list[randoms[key]]); break; //break out of _while_ loop } } return returnList; } 

使用1范围内的随机列表 – 总列表数量,然后简单地拉出列表中的项目似乎是最好的方法,但使用字典确保唯一性是我仍在思考的东西。

还要注意我使用了一个string列表,根据需要进行replace

为什么不是这样的事情:

  Dim ar As New ArrayList Dim numToGet As Integer = 5 'hard code just to test ar.Add("12") ar.Add("11") ar.Add("10") ar.Add("15") ar.Add("16") ar.Add("17") Dim randomListOfProductIds As New ArrayList Dim toAdd As String = "" For i = 0 To numToGet - 1 toAdd = ar(CInt((ar.Count - 1) * Rnd())) randomListOfProductIds.Add(toAdd) 'remove from id list ar.Remove(toAdd) Next 'sorry i'm lazy and have to write vb at work :( and didn't feel like converting to c# 

It is a lot harder than one would think. See the great Article "Shuffling" from Jeff.

I did write a very short article on that subject including C# code:
Return random subset of N elements of a given array

Here you have one implementation based on Fisher-Yates Shuffle whose algorithm complexity is O(n) where n is the subset or sample size, instead of the list size, as John Shedletsky pointed out.

 public static IEnumerable<T> GetRandomSample<T>(this IList<T> list, int sampleSize) { if (list == null) throw new ArgumentNullException("list"); if (sampleSize > list.Count) throw new ArgumentException("sampleSize may not be greater than list count", "sampleSize"); var indices = new Dictionary<int, int>(); int index; var rnd = new Random(); for (int i = 0; i < sampleSize; i++) { int j = rnd.Next(i, list.Count); if (!indices.TryGetValue(j, out index)) index = j; yield return list[index]; if (!indices.TryGetValue(i, out index)) index = i; indices[j] = index; } } 

I recently did this on my project using an idea similar to Tyler's point 1 .
I was loading a bunch of questions and selecting five at random. Sorting was achieved using an IComparer .
aAll questions were loaded in the a QuestionSorter list, which was then sorted using the List's Sort function and the first k elements where selected.

  private class QuestionSorter : IComparable<QuestionSorter> { public double SortingKey { get; set; } public Question QuestionObject { get; set; } public QuestionSorter(Question q) { this.SortingKey = RandomNumberGenerator.RandomDouble; this.QuestionObject = q; } public int CompareTo(QuestionSorter other) { if (this.SortingKey < other.SortingKey) { return -1; } else if (this.SortingKey > other.SortingKey) { return 1; } else { return 0; } } } 

用法:

  List<QuestionSorter> unsortedQuestions = new List<QuestionSorter>(); // add the questions here unsortedQuestions.Sort(unsortedQuestions as IComparer<QuestionSorter>); // select the first k elements 

Here's my approach (full text here http://krkadev.blogspot.com/2010/08/random-numbers-without-repetition.html ).

It should run in O(K) instead of O(N), where K is the number of wanted elements and N is the size of the list to choose from:

 public <T> List<T> take(List<T> source, int k) { int n = source.size(); if (k > n) { throw new IllegalStateException( "Can not take " + k + " elements from a list with " + n + " elements"); } List<T> result = new ArrayList<T>(k); Map<Integer,Integer> used = new HashMap<Integer,Integer>(); int metric = 0; for (int i = 0; i < k; i++) { int off = random.nextInt(n - i); while (true) { metric++; Integer redirect = used.put(off, n - i - 1); if (redirect == null) { break; } off = redirect; } result.add(source.get(off)); } assert metric <= 2*k; return result; } 

This isn't as elegant or efficient as the accepted solution, but it's quick to write up. First, permute the array randomly, then select the first K elements. In python,

 import numpy N = 20 K = 5 idx = np.arange(N) numpy.random.shuffle(idx) print idx[:K] 

I would use an extension method.

  public static IEnumerable<T> TakeRandom<T>(this IEnumerable<T> elements, int countToTake) { var random = new Random(); var internalList = elements.ToList(); var selected = new List<T>(); for (var i = 0; i < countToTake; ++i) { var next = random.Next(0, internalList.Count - selected.Count); selected.Add(internalList[next]); internalList[next] = internalList[internalList.Count - selected.Count]; } return selected; } 
 public static IEnumerable<T> GetRandom<T>(this IList<T> list, int count, Random random) { // Probably you should throw exception if count > list.Count count = Math.Min(list.Count, count); var selectedIndices = new SortedSet<int>(); // Random upper bound int randomMax = list.Count - 1; while (selectedIndices.Count < count) { int randomIndex = random.Next(0, randomMax); // skip over already selected indeces foreach (var selectedIndex in selectedIndices) if (selectedIndex <= randomIndex) ++randomIndex; else break; yield return list[randomIndex]; selectedIndices.Add(randomIndex); --randomMax; } } 

Memory: ~count
Complexity: O(count 2 )

When N is very large, the normal method that randomly shuffles the N numbers and selects, say, first k numbers, can be prohibitive because of space complexity. The following algorithm requires only O(k) for both time and space complexities.

http://arxiv.org/abs/1512.00501

 def random_selection_indices(num_samples, N): modified_entries = {} seq = [] for n in xrange(num_samples): i = N - n - 1 j = random.randrange(i) # swap a[j] and a[i] a_j = modified_entries[j] if j in modified_entries else j a_i = modified_entries[i] if i in modified_entries else i if a_i != j: modified_entries[j] = a_i elif j in modified_entries: # no need to store the modified value if it is the same as index modified_entries.pop(j) if a_j != i: modified_entries[i] = a_j elif i in modified_entries: # no need to store the modified value if it is the same as index modified_entries.pop(i) seq.append(a_j) return seq 

Using LINQ with large lists (when costly to touch each element) AND if you can live with the possibility of duplicates:

 new int[5].Select(o => (int)(rnd.NextDouble() * maxIndex)).Select(i => YourIEnum.ElementAt(i)) 

For my use i had a list of 100.000 elements, and because of them being pulled from a DB I about halfed (or better) the time compared to a rnd on the whole list.

Having a large list will reduce the odds greatly for duplicates.