如何在O(n)中find长度为n的未sorting数组中的第k个最大元素?

我相信有一种方法可以在O(n)中find长度为n的未sorting数组中的第k个最大元素。 或者,也许它是“预计”O(n)什么的。 我们怎么做到这一点?

这被称为find第k阶统计量 。 有一个非常简单的随机algorithm(称为quickselect ),取O(n)平均时间, O(n^2)最差情况时间,以及一个非常复杂的非随机algorithm(称为introselect )。 维基百科上有一些信息,但不是很好。

你需要的一切就是在这些幻灯片中 。 只是提取O(n)最坏情况algorithm的基本algorithm(introselect):

 Select(A,n,i): Divide input into ⌈n/5⌉ groups of size 5. /* Partition on median-of-medians */ medians = array of each group's median. pivot = Select(medians, ⌈n/5⌉, ⌈n/10⌉) Left Array L and Right Array G = partition(A, pivot) /* Find ith element in L, pivot, or G */ k = |L| + 1 If i = k, return pivot If i < k, return Select(L, k-1, i) If i > k, return Select(G, nk, ik) 

Cormen等人在“algorithm导论”一书中也详细介绍了这一点。

如果你想要一个真正的O(n)algorithm,而不是O(kn)或类似的东西,那么你应该使用快速select(它基本上是快速排出你不感兴趣的分区)。 我的教授有一个伟大的写作,与运行时分析:( 参考 )

QuickSelectalgorithm可快速find未sorting的n元素数组的第k个最小元素。 这是一个随机algorithm ,所以我们计算最坏情况下的预期运行时间。

这是algorithm。

 QuickSelect(A, k) let r be chosen uniformly at random in the range 1 to length(A) let pivot = A[r] let A1, A2 be new arrays # split into a pile A1 of small elements and A2 of big elements for i = 1 to n if A[i] < pivot then append A[i] to A1 else if A[i] > pivot then append A[i] to A2 else # do nothing end for if k <= length(A1): # it's in the pile of small elements return QuickSelect(A1, k) else if k > length(A) - length(A2) # it's in the pile of big elements return QuickSelect(A2, k - (length(A) - length(A2)) else # it's equal to the pivot return pivot 

这个algorithm的运行时间是多less? 如果对手为我们翻转硬币,我们可能会发现这个枢轴始终是最大的元素, k始终为1,给出的运行时间为

 T(n) = Theta(n) + T(n-1) = Theta(n 2 ) 

但是,如果select确实是随机的,预期的运行时间由下式给出

 T(n) <= Theta(n) + (1/n) ∑ i=1 to n T(max(i, ni-1)) 

我们在这里并不是完全合理的假设,即recursion总是落在A1A2中的较大者。

让我们猜测T(n) <= an a 。 然后我们得到

 T(n) <= cn + (1/n) ∑ i=1 to n T(max(i-1, ni)) = cn + (1/n) ∑ i=1 to floor(n/2) T(ni) + (1/n) ∑ i=floor(n/2)+1 to n T(i) <= cn + 2 (1/n) ∑ i=floor(n/2) to n T(i) <= cn + 2 (1/n) ∑ i=floor(n/2) to n ai 

现在不知何故,我们必须得到加号右边的可怕的总和来吸收左边的cn 。 如果我们把它定义为2(1/n) ∑ i=n/2 to n an ,我们可以得到大约2(1/n)(n/2)an = an 。 但是这太大了 – 没有余地挤压额外的cn 。 所以让我们使用算术级数公式来扩展和:

 i=floor(n/2) to n i = ∑ i=1 to n i - ∑ i=1 to floor(n/2) i = n(n+1)/2 - floor(n/2)(floor(n/2)+1)/2 <= n 2 /2 - (n/4) 2 /2 = (15/32)n 2 

我们利用n“足够大”的优势,用更清洁(和更小)的n/4来代替丑陋的floor(n/2)因素。 现在我们可以继续

 cn + 2 (1/n) ∑ i=floor(n/2) to n ai, <= cn + (2a/n) (15/32) n 2 = n (c + (15/16)a) <= an 

提供a > 16c

这给出T(n) = O(n) 。 这显然是Omega(n) ,所以我们得到T(n) = Theta(n)

那个(第k个最大元素数组)的快速Google返回了这个: http : //discuss.joelonsoftware.com/default.asp?interview.11.509587.17

 "Make one pass through tracking the three largest values so far." 

(这是专门为3D最大)

和这个答案:

 Build a heap/priority queue. O(n) Pop top element. O(log n) Pop top element. O(log n) Pop top element. O(log n) Total = O(n) + 3 O(log n) = O(n) 

你喜欢快速sorting。 随机挑选一个元素,并将所有内容向上或向下移动。 在这一点上,你会知道你实际select了哪一个元素,如果它是你完成的第k个元素,否则你重复一下bin(更高或更低),第k个元素将落入。统计上来说,时间它需要find第n个元素增长,O(n)。

程序员的algorithm分析伴侣提供了一个O(n)的版本,尽pipe作者声明常数因子如此之高,你可能更喜欢朴素的列表然后select方法。

我回答了你的问题的信:)

C ++标准库几乎正是那个函数调用nth_element ,尽pipe它修改你的数据。 它已经期望线性运行时间O(N),并且它也是部分sorting的。

 const int N = ...; double a[N]; // ... const int m = ...; // m < N nth_element (a, a + m, a + N); // a[m] contains the mth element in a 

虽然O(n)的复杂性不是很确定,但它肯定会介于O(n)和nLog(n)之间。 还要确保比nLog(n)更接近O(n)。 函数是用Java编写的

 public int quickSelect(ArrayList<Integer>list, int nthSmallest){ //Choose random number in range of 0 to array length Random random = new Random(); //This will give random number which is not greater than length - 1 int pivotIndex = random.nextInt(list.size() - 1); int pivot = list.get(pivotIndex); ArrayList<Integer> smallerNumberList = new ArrayList<Integer>(); ArrayList<Integer> greaterNumberList = new ArrayList<Integer>(); //Split list into two. //Value smaller than pivot should go to smallerNumberList //Value greater than pivot should go to greaterNumberList //Do nothing for value which is equal to pivot for(int i=0; i<list.size(); i++){ if(list.get(i)<pivot){ smallerNumberList.add(list.get(i)); } else if(list.get(i)>pivot){ greaterNumberList.add(list.get(i)); } else{ //Do nothing } } //If smallerNumberList size is greater than nthSmallest value, nthSmallest number must be in this list if(nthSmallest < smallerNumberList.size()){ return quickSelect(smallerNumberList, nthSmallest); } //If nthSmallest is greater than [ list.size() - greaterNumberList.size() ], nthSmallest number must be in this list //The step is bit tricky. If confusing, please see the above loop once again for clarification. else if(nthSmallest > (list.size() - greaterNumberList.size())){ //nthSmallest will have to be changed here. [ list.size() - greaterNumberList.size() ] elements are already in //smallerNumberList nthSmallest = nthSmallest - (list.size() - greaterNumberList.size()); return quickSelect(greaterNumberList,nthSmallest); } else{ return pivot; } } 

我使用dynamic编程实现了在n个未sorting元素中find第k个最小值,特别是锦标赛方法。 执行时间是O(n + klog(n))。 所使用的机制被列为维基百科页面上关于selectalgorithm的方法之一(如上文所述之一所示)。 您可以阅读关于algorithm,并在我的博客页面查找代码(java) 寻找Kth最小值 。 此外,逻辑可以执行列表的部分sorting – 返回O(klog(n))时间中的第一个K min(或max)。

尽pipe代码提供的结果是第k个最小值,但是可以采用类似的逻辑来在O(klog(n))中find第k个最大值,忽略了创build锦标赛树所做的前期工作。

你可以用O(n + kn)= O(n)(对于常数k)和O(k)对空间进行操作,跟踪你所看到的k个最大的元素。

对于数组中的每个元素,您可以扫描k个最大的列表,并用较大的元素replace最小的元素。

沃伦的优先堆解决scheme是整洁。

阅读Cormen的“algorithm导论”第二版第9章,Medians和其他统计数据。 它有一个预期的线性时间algorithm供select。 这不是人们在几分钟内随意想出来的东西。堆sorting,顺便说一句,不会在O(n)中工作,它是O(nlgn)。

在线性时间内查找数组的中位数,然后像使用快速sorting一样使用分区过程将数组分成两部分,中值左侧的值(<)比中间值大,右侧的值大于(>)中值,那么也可以在lineat时间内完成,现在,到第k个元素所在的数组的那一部分,现在recursion变为:T(n)= T(n / 2)+ cn,这使得O(n)整数。

性感的快速select在Python中

 def quickselect(arr, k): ''' k = 1 returns first element in ascending order. can be easily modified to return first element in descending order ''' r = random.randrange(0, len(arr)) a1 = [i for i in arr if i < arr[r]] '''partition''' a2 = [i for i in arr if i > arr[r]] if k <= len(a1): return quickselect(a1, k) elif k > len(arr)-len(a2): return quickselect(a2, k - (len(arr) - len(a2))) else: return arr[r] 

以下是完整实现的链接,并且非常广泛地解释了在未sortingalgorithm中如何find第K个元素的algorithm。 基本的想法是像在QuickSort中一样对数组进行分区。 但是为了避免极端情况(例如,当在每一步中select最小元素作为枢轴,使得algorithm退化为O(n ^ 2)运行时间),应用特殊的枢轴select,称为中值中值algorithm。 整个解决scheme运行在最坏的情况下,平均情况下是O(n)时间。

这里是链接到完整的文章(这是关于findKth 最小的元素,但原则是相同的findKth 最大 ):

在未sorting的数组中find第K个最小的元素

根据本文在n项列表中find第K个最大项,以下algorithm在最坏情况下将花费O(n)时间。

  1. 将数组分成5个元素的n / 5个列表。
  2. 在5个元素的每个子数组中find中位数。
  3. recursionfind所有中位数的中位数,我们称之为M
  4. 将数组分成两个子数组,第一个子数组包含大于M的元素,可以说这个子数组是a1,而其他子数组包含的元素小于M,让我们调用这个子数组a2。
  5. 如果k <= | a1 |,则返回select(a1,k)。
  6. 如果k-1 = | a1 |,返回M.
  7. 如果k> | a1 | + 1,返回select(a2,k -a1 – 1)。

分析:正如原文所build议:

我们使用中位数将列表分成两半(前半部分,如果k <= n/2 ,否则后半部分)。 该algorithm在recursion的第一级需要时间cn ,对于下一级的某个常数ccn/2 (因为我们recursion列表的大小为n / 2),在第三级的cn/4等等。 所需的总时间是cn + cn/2 + cn/4 + .... = 2cn = o(n)

为什么分区大小是5而不是3?

正如原文所述:

将列表除以5确保最差情况下的分裂70-30。至less一半的中位数大于中位数,因此至less一半的n / 5个区块具有至less3个元素,这给出了3n/10分裂,这意味着在最坏的情况下另一个分区是7n / 10。 这给出T(n) = T(n/5)+T(7n/10)+O(n). Since n/5+7n/10 < 1 T(n) = T(n/5)+T(7n/10)+O(n). Since n/5+7n/10 < 1 ,最坏情况下的运行时间是O(n)

现在我试图实现上面的algorithm:

 public static int findKthLargestUsingMedian(Integer[] array, int k) { // Step 1: Divide the list into n/5 lists of 5 element each. int noOfRequiredLists = (int) Math.ceil(array.length / 5.0); // Step 2: Find pivotal element aka median of medians. int medianOfMedian = findMedianOfMedians(array, noOfRequiredLists); //Now we need two lists split using medianOfMedian as pivot. All elements in list listOne will be grater than medianOfMedian and listTwo will have elements lesser than medianOfMedian. List<Integer> listWithGreaterNumbers = new ArrayList<>(); // elements greater than medianOfMedian List<Integer> listWithSmallerNumbers = new ArrayList<>(); // elements less than medianOfMedian for (Integer element : array) { if (element < medianOfMedian) { listWithSmallerNumbers.add(element); } else if (element > medianOfMedian) { listWithGreaterNumbers.add(element); } } // Next step. if (k <= listWithGreaterNumbers.size()) return findKthLargestUsingMedian((Integer[]) listWithGreaterNumbers.toArray(new Integer[listWithGreaterNumbers.size()]), k); else if ((k - 1) == listWithGreaterNumbers.size()) return medianOfMedian; else if (k > (listWithGreaterNumbers.size() + 1)) return findKthLargestUsingMedian((Integer[]) listWithSmallerNumbers.toArray(new Integer[listWithSmallerNumbers.size()]), k-listWithGreaterNumbers.size()-1); return -1; } public static int findMedianOfMedians(Integer[] mainList, int noOfRequiredLists) { int[] medians = new int[noOfRequiredLists]; for (int count = 0; count < noOfRequiredLists; count++) { int startOfPartialArray = 5 * count; int endOfPartialArray = startOfPartialArray + 5; Integer[] partialArray = Arrays.copyOfRange((Integer[]) mainList, startOfPartialArray, endOfPartialArray); // Step 2: Find median of each of these sublists. int medianIndex = partialArray.length/2; medians[count] = partialArray[medianIndex]; } // Step 3: Find median of the medians. return medians[medians.length / 2]; } 

为了完成,另一个algorithm使用优先级队列,花费时间O(nlogn)

 public static int findKthLargestUsingPriorityQueue(Integer[] nums, int k) { int p = 0; int numElements = nums.length; // create priority queue where all the elements of nums will be stored PriorityQueue<Integer> pq = new PriorityQueue<Integer>(); // place all the elements of the array to this priority queue for (int n : nums) { pq.add(n); } // extract the kth largest element while (numElements - k + 1 > 0) { p = pq.poll(); k++; } return p; } 

这两种algorithm都可以testing为:

 public static void main(String[] args) throws IOException { Integer[] numbers = new Integer[]{2, 3, 5, 4, 1, 12, 11, 13, 16, 7, 8, 6, 10, 9, 17, 15, 19, 20, 18, 23, 21, 22, 25, 24, 14}; System.out.println(findKthLargestUsingMedian(numbers, 8)); System.out.println(findKthLargestUsingPriorityQueue(numbers, 8)); } 

如预期的产出是: 18 18

遍历列表。 如果当前值大于所存储的最大值,则将其存储为最大值,并将列表中的1-4下降5滴。 如果没有,请将其与数字2进行比较,并执行相同的操作。 重复,检查所有5个存储的值。 这应该在O(n)

我想build议一个答案

如果我们把前k个元素,并将它们sorting成k值的链表

现在对于其他值,即使对于最坏的情况,如果我们插入sorting为nk的值,即使在最坏的情况下,比较的次数是k *(nk),对于prev k值,将被sorting为k *(k- 1)所以它是(nk-k)是o(n)

干杯

可以在这里find用于查找第n个最大整数n的中位数algorithm的说明: http : //cs.indstate.edu/~spitla/presentation.pdf

在c ++中的实现如下:

 #include <iostream> #include <vector> #include <algorithm> using namespace std; int findMedian(vector<int> vec){ // Find median of a vector int median; size_t size = vec.size(); median = vec[(size/2)]; return median; } int findMedianOfMedians(vector<vector<int> > values){ vector<int> medians; for (int i = 0; i < values.size(); i++) { int m = findMedian(values[i]); medians.push_back(m); } return findMedian(medians); } void selectionByMedianOfMedians(const vector<int> values, int k){ // Divide the list into n/5 lists of 5 elements each vector<vector<int> > vec2D; int count = 0; while (count != values.size()) { int countRow = 0; vector<int> row; while ((countRow < 5) && (count < values.size())) { row.push_back(values[count]); count++; countRow++; } vec2D.push_back(row); } cout<<endl<<endl<<"Printing 2D vector : "<<endl; for (int i = 0; i < vec2D.size(); i++) { for (int j = 0; j < vec2D[i].size(); j++) { cout<<vec2D[i][j]<<" "; } cout<<endl; } cout<<endl; // Calculating a new pivot for making splits int m = findMedianOfMedians(vec2D); cout<<"Median of medians is : "<<m<<endl; // Partition the list into unique elements larger than 'm' (call this sublist L1) and // those smaller them 'm' (call this sublist L2) vector<int> L1, L2; for (int i = 0; i < vec2D.size(); i++) { for (int j = 0; j < vec2D[i].size(); j++) { if (vec2D[i][j] > m) { L1.push_back(vec2D[i][j]); }else if (vec2D[i][j] < m){ L2.push_back(vec2D[i][j]); } } } // Checking the splits as per the new pivot 'm' cout<<endl<<"Printing L1 : "<<endl; for (int i = 0; i < L1.size(); i++) { cout<<L1[i]<<" "; } cout<<endl<<endl<<"Printing L2 : "<<endl; for (int i = 0; i < L2.size(); i++) { cout<<L2[i]<<" "; } // Recursive calls if ((k - 1) == L1.size()) { cout<<endl<<endl<<"Answer :"<<m; }else if (k <= L1.size()) { return selectionByMedianOfMedians(L1, k); }else if (k > (L1.size() + 1)){ return selectionByMedianOfMedians(L2, k-((int)L1.size())-1); } } int main() { int values[] = {2, 3, 5, 4, 1, 12, 11, 13, 16, 7, 8, 6, 10, 9, 17, 15, 19, 20, 18, 23, 21, 22, 25, 24, 14}; vector<int> vec(values, values + 25); cout<<"The given array is : "<<endl; for (int i = 0; i < vec.size(); i++) { cout<<vec[i]<<" "; } selectionByMedianOfMedians(vec, 8); return 0; } 

Wirth的selectalgorithm也比QuickSelect简单。 Wirth的selectalgorithm比QuickSelect慢,但是有一些改进会变得更快。

更详细地说。 使用弗拉基米尔·扎布罗德斯基的MODIFIND优化和3位数中值select,并注意algorithm的分区部分的最后一步,我提出了下面的algorithm(可以命名为“LefSelect”):

 #define F_SWAP(a,b) { float temp=(a);(a)=(b);(b)=temp; } # Note: The code needs more than 2 elements to work float lefselect(float a[], const int n, const int k) { int l=0, m = n-1, i=l, j=m; float x; while (l<m) { if( a[k] < a[i] ) F_SWAP(a[i],a[k]); if( a[j] < a[i] ) F_SWAP(a[i],a[j]); if( a[j] < a[k] ) F_SWAP(a[k],a[j]); x=a[k]; while (j>k & i<k) { do i++; while (a[i]<x); do j--; while (a[j]>x); F_SWAP(a[i],a[j]); } i++; j--; if (j<k) { while (a[i]<x) i++; l=i; j=m; } if (k<i) { while (x<a[j]) j--; m=j; i=l; } } return a[k]; } 

在我做的基准testing中,LefSelect比QuickSelect快20-30%。

Haskell解决scheme:

 kthElem index list = sort list !! index withShape ~[] [] = [] withShape ~(x:xs) (y:ys) = x : withShape xs ys sort [] = [] sort (x:xs) = (sort ls `withShape` ls) ++ [x] ++ (sort rs `withShape` rs) where ls = filter (< x) rs = filter (>= x) 

这通过使用withShape方法来发现分区的大小而不实际计算它实现中值解的中值。

这是一个随机QuickSelect的C ++实现。 这个想法是随机select一个元素。 为了实现随机分区,我们使用随机函数rand()生成l和r之间的索引,将随机生成的索引处的元素与最后一个元素进行交换,最后调用标准分区进程,将最后一个元素作为枢轴。

 #include<iostream> #include<climits> #include<cstdlib> using namespace std; int randomPartition(int arr[], int l, int r); // This function returns k'th smallest element in arr[l..r] using // QuickSort based method. ASSUMPTION: ALL ELEMENTS IN ARR[] ARE DISTINCT int kthSmallest(int arr[], int l, int r, int k) { // If k is smaller than number of elements in array if (k > 0 && k <= r - l + 1) { // Partition the array around a random element and // get position of pivot element in sorted array int pos = randomPartition(arr, l, r); // If position is same as k if (pos-l == k-1) return arr[pos]; if (pos-l > k-1) // If position is more, recur for left subarray return kthSmallest(arr, l, pos-1, k); // Else recur for right subarray return kthSmallest(arr, pos+1, r, k-pos+l-1); } // If k is more than number of elements in array return INT_MAX; } void swap(int *a, int *b) { int temp = *a; *a = *b; *b = temp; } // Standard partition process of QuickSort(). It considers the last // element as pivot and moves all smaller element to left of it and // greater elements to right. This function is used by randomPartition() int partition(int arr[], int l, int r) { int x = arr[r], i = l; for (int j = l; j <= r - 1; j++) { if (arr[j] <= x) //arr[i] is bigger than arr[j] so swap them { swap(&arr[i], &arr[j]); i++; } } swap(&arr[i], &arr[r]); // swap the pivot return i; } // Picks a random pivot element between l and r and partitions // arr[l..r] around the randomly picked element using partition() int randomPartition(int arr[], int l, int r) { int n = r-l+1; int pivot = rand() % n; swap(&arr[l + pivot], &arr[r]); return partition(arr, l, r); } // Driver program to test above methods int main() { int arr[] = {12, 3, 5, 7, 4, 19, 26}; int n = sizeof(arr)/sizeof(arr[0]), k = 3; cout << "K'th smallest element is " << kthSmallest(arr, 0, n-1, k); return 0; } 

上述解决scheme的最坏情况下的时间复杂度仍然是O(n2)。在最坏的情况下,随机函数可能总是select一个angular元素。 上述随机化QuickSelect的预期时间复杂度为Θ(n)

这是Javascript中的一个实现。

如果释放不能修改数组的约束,则可以使用两个索引来防止使用额外内存来标识“当前分区”(以经典的快速排版风格 – http://www.nczonline.net/blog/2012/ 11/27 /计算机科学在JavaScript的quicksort / )。

 function kthMax(a, k){ var size = a.length; var pivot = a[ parseInt(Math.random()*size) ]; //Another choice could have been (size / 2) //Create an array with all element lower than the pivot and an array with all element higher than the pivot var i, lowerArray = [], upperArray = []; for (i = 0; i < size; i++){ var current = a[i]; if (current < pivot) { lowerArray.push(current); } else if (current > pivot) { upperArray.push(current); } } //Which one should I continue with? if(k <= upperArray.length) { //Upper return kthMax(upperArray, k); } else { var newK = k - (size - lowerArray.length); if (newK > 0) { ///Lower return kthMax(lowerArray, newK); } else { //None ... it's the current pivot! return pivot; } } } 

如果你想testing它的performance,你可以使用这个变化:

  function kthMax (a, k, logging) { var comparisonCount = 0; //Number of comparison that the algorithm uses var memoryCount = 0; //Number of integers in memory that the algorithm uses var _log = logging; if(k < 0 || k >= a.length) { if (_log) console.log ("k is out of range"); return false; } function _kthmax(a, k){ var size = a.length; var pivot = a[parseInt(Math.random()*size)]; if(_log) console.log("Inputs:", a, "size="+size, "k="+k, "pivot="+pivot); // This should never happen. Just a nice check in this exercise // if you are playing with the code to avoid never ending recursion if(typeof pivot === "undefined") { if (_log) console.log ("Ops..."); return false; } var i, lowerArray = [], upperArray = []; for (i = 0; i < size; i++){ var current = a[i]; if (current < pivot) { comparisonCount += 1; memoryCount++; lowerArray.push(current); } else if (current > pivot) { comparisonCount += 2; memoryCount++; upperArray.push(current); } } if(_log) console.log("Pivoting:",lowerArray, "*"+pivot+"*", upperArray); if(k <= upperArray.length) { comparisonCount += 1; return _kthmax(upperArray, k); } else if (k > size - lowerArray.length) { comparisonCount += 2; return _kthmax(lowerArray, k - (size - lowerArray.length)); } else { comparisonCount += 2; return pivot; } /* * BTW, this is the logic for kthMin if we want to implement that... ;-) * if(k <= lowerArray.length) { return kthMin(lowerArray, k); } else if (k > size - upperArray.length) { return kthMin(upperArray, k - (size - upperArray.length)); } else return pivot; */ } var result = _kthmax(a, k); return {result: result, iterations: comparisonCount, memory: memoryCount}; } 

其余的代码只是创build一些操场:

  function getRandomArray (n){ var ar = []; for (var i = 0, l = n; i < l; i++) { ar.push(Math.round(Math.random() * l)) } return ar; } //Create a random array of 50 numbers var ar = getRandomArray (50); 

现在,运行一下你的testing。 由于Math.random()会每次产生不同的结果:

  kthMax(ar, 2, true); kthMax(ar, 2); kthMax(ar, 2); kthMax(ar, 2); kthMax(ar, 2); kthMax(ar, 2); kthMax(ar, 34, true); kthMax(ar, 34); kthMax(ar, 34); kthMax(ar, 34); kthMax(ar, 34); kthMax(ar, 34); 

If you test it a few times you can see even empirically that the number of iterations is, on average, O(n) ~= constant * n and the value of k does not affect the algorithm.

I came up with this algorithm and seems to be O(n):

Let's say k=3 and we want to find the 3rd largest item in the array. I would create three variables and compare each item of the array with the minimum of these three variables. If array item is greater than our minimum, we would replace the min variable with the item value. We continue the same thing until end of the array. The minimum of our three variables is the 3rd largest item in the array.

 define variables a=0, b=0, c=0 iterate through the array items find minimum a,b,c if item > min then replace the min variable with item value continue until end of array the minimum of a,b,c is our answer 

And, to find Kth largest item we need K variables.

Example: (k=3)

 [1,2,4,1,7,3,9,5,6,2,9,8] Final variable values: a=7 (answer) b=8 c=9 

Can someone please review this and let me know what I am missing?

Here is the implementation of the algorithm eladv suggested(I also put here the implementation with random pivot):

 public class Median { public static void main(String[] s) { int[] test = {4,18,20,3,7,13,5,8,2,1,15,17,25,30,16}; System.out.println(selectK(test,8)); /* int n = 100000000; int[] test = new int[n]; for(int i=0; i<test.length; i++) test[i] = (int)(Math.random()*test.length); long start = System.currentTimeMillis(); random_selectK(test, test.length/2); long end = System.currentTimeMillis(); System.out.println(end - start); */ } public static int random_selectK(int[] a, int k) { if(a.length <= 1) return a[0]; int r = (int)(Math.random() * a.length); int p = a[r]; int small = 0, equal = 0, big = 0; for(int i=0; i<a.length; i++) { if(a[i] < p) small++; else if(a[i] == p) equal++; else if(a[i] > p) big++; } if(k <= small) { int[] temp = new int[small]; for(int i=0, j=0; i<a.length; i++) if(a[i] < p) temp[j++] = a[i]; return random_selectK(temp, k); } else if (k <= small+equal) return p; else { int[] temp = new int[big]; for(int i=0, j=0; i<a.length; i++) if(a[i] > p) temp[j++] = a[i]; return random_selectK(temp,k-small-equal); } } public static int selectK(int[] a, int k) { if(a.length <= 5) { Arrays.sort(a); return a[k-1]; } int p = median_of_medians(a); int small = 0, equal = 0, big = 0; for(int i=0; i<a.length; i++) { if(a[i] < p) small++; else if(a[i] == p) equal++; else if(a[i] > p) big++; } if(k <= small) { int[] temp = new int[small]; for(int i=0, j=0; i<a.length; i++) if(a[i] < p) temp[j++] = a[i]; return selectK(temp, k); } else if (k <= small+equal) return p; else { int[] temp = new int[big]; for(int i=0, j=0; i<a.length; i++) if(a[i] > p) temp[j++] = a[i]; return selectK(temp,k-small-equal); } } private static int median_of_medians(int[] a) { int[] b = new int[a.length/5]; int[] temp = new int[5]; for(int i=0; i<b.length; i++) { for(int j=0; j<5; j++) temp[j] = a[5*i + j]; Arrays.sort(temp); b[i] = temp[2]; } return selectK(b, b.length/2 + 1); } } 

How about this kinda approach

Maintain a buffer of length k and a tmp_max , getting tmp_max is O(k) and is done n times so something like O(kn)

在这里输入图像描述

Is it right or am i missing something ?

Although it doesn't beat average case of quickselect and worst case of median statistics method but its pretty easy to understand and implement.

it is similar to the quickSort strategy, where we pick an arbitrary pivot, and bring the smaller elements to its left, and the larger to the right

  public static int kthElInUnsortedList(List<int> list, int k) { if (list.Count == 1) return list[0]; List<int> left = new List<int>(); List<int> right = new List<int>(); int pivotIndex = list.Count / 2; int pivot = list[pivotIndex]; //arbitrary for (int i = 0; i < list.Count && i != pivotIndex; i++) { int currentEl = list[i]; if (currentEl < pivot) left.Add(currentEl); else right.Add(currentEl); } if (k == left.Count + 1) return pivot; if (left.Count < k) return kthElInUnsortedList(right, k - left.Count - 1); else return kthElInUnsortedList(left, k); } 
  1. Have Priority queue created.
  2. Insert all the elements into heap.
  3. Call poll() k times.

     public static int getKthLargestElements(int[] arr) { PriorityQueue<Integer> pq = new PriorityQueue<>((x , y) -> (yx)); //insert all the elements into heap for(int ele : arr) pq.offer(ele); // call poll() k times int i=0; while(i&lt;k) { int result = pq.poll(); } return result; } 

You can find the kth smallest element in O(n) time and constant space. If we consider the array is only for integers.

The approach is to do a binary search on the range of Array values. If we have a min_value and a max_value both in integer range, we can do a binary search on that range. We can write a comparator function which will tell us if any value is the kth-smallest or smaller than kth-smallest or bigger than kth-smallest. Do the binary search until you reach the kth-smallest number

Here is the code for that

class Solution:

 def _iskthsmallest(self, A, val, k): less_count, equal_count = 0, 0 for i in range(len(A)): if A[i] == val: equal_count += 1 if A[i] < val: less_count += 1 if less_count >= k: return 1 if less_count + equal_count < k: return -1 return 0 def kthsmallest_binary(self, A, min_val, max_val, k): if min_val == max_val: return min_val mid = (min_val + max_val)/2 iskthsmallest = self._iskthsmallest(A, mid, k) if iskthsmallest == 0: return mid if iskthsmallest > 0: return self.kthsmallest_binary(A, min_val, mid, k) return self.kthsmallest_binary(A, mid+1, max_val, k) # @param A : tuple of integers # @param B : integer # @return an integer def kthsmallest(self, A, k): if not A: return 0 if k > len(A): return 0 min_val, max_val = min(A), max(A) return self.kthsmallest_binary(A, min_val, max_val, k) 

What I would do is this:

 initialize empty doubly linked list l for each element e in array if e larger than head(l) make e the new head of l if size(l) > k remove last element from l the last element of l should now be the kth largest element 

You can simply store pointers to the first and last element in the linked list. They only change when updates to the list are made.

更新:

 initialize empty sorted tree l for each element e in array if e between head(l) and tail(l) insert e into l // O(log k) if size(l) > k remove last element from l the last element of l should now be the kth largest element 

First we can build a BST from unsorted array which takes O(n) time and from the BST we can find the kth smallest element in O(log(n)) which over all counts to an order of O(n).