简单的面试问题变得更加困难:给定数字1..100,find缺less的数字(s)

我有一段很有意思的面试经历。 问题开始真的很简单:

Q1 :我们有一个包含数字3 ,…, 100 。 每个号码只出现一次,所以有100个号码。 现在从包里随机抽出一个号码。 find缺less的号码。

当然,我之前听过这个面试问题,所以我很快就回答了:

A1 :那么数字1 + 2 + 3 + … + N的总和就是(N+1)(N/2) (参见维基百科:算术系列之和 )。 对于N = 100 ,总和是5050

因此,如果包里有所有的数字,那么总和就是5050 。 由于缺less一个数字,所以总和将小于这个数字,不同之处在于这个数字。 所以我们可以在O(N)时间和O(1)空间find缺失的数字。

在这一点上,我认为我做得很好,但突然之间,这个问题出乎意料地转向:

Q2 :这是正确的,但是现在如果两个数字不见了,你会怎么做呢?

我从来没有见过/听过/考虑过这种变化,所以我惊慌失措,无法回答这个问题。 面试官坚持了解我的思维过程,所以我提到也许我们可以通过与预期产品进行比较来获得更多的信息,或者在收集了第一遍等等的信息之后进行第二轮的处理,但是我真的只是在拍摄在黑暗中,而不是实际上有一个清晰的解决schemepath。

面试官试图鼓励我说,有第二个等式确实是解决问题的一种方法。 在这一点上,我有点不高兴(因为不知道答案),并询问这是一个普通的(阅读:“有用的”)编程技巧,或者它只是一个技巧/疑难答案。

面试官的回答让我感到吃惊:你可以推广这项技术,找出3个缺失的数字。 事实上,你可以概括它findk缺失的数字。

Qk :如果包里没有K个数字,你会如何有效地find它?

这是几个月前,我仍然不知道这个技术是什么。 显然有一个Ω(N)时间的下界,因为我们必须扫描所有的数字至less一次,但采访者坚持认为,求解技术的时间空间复杂度(减去O(N)时间input扫描)定义为k不是N。

所以这里的问题很简单:

  • 你将如何解决Q2
  • 你将如何解决Q3
  • 你将如何解决QK

澄清

  • 通常从1 … NN个数字,而不是1..100。
  • 我没有寻找明显的基于集合的解决scheme,例如使用一个位集合 ,通过指定位的值对每个数字的存在/不存在进行编码,因此在额外的空间中使用O(N)位。 我们无法承受任何额外的空间与N成正比。
  • 我也没有寻找明显的sorting第一的方法。 这个和基于集合的方法在访谈中值得一提(它们很容易实现,而且取决于N ,可以非常实用)。 我正在寻找圣杯解决scheme(这可能或可能不实际执行,但仍然具有所需的渐近特性)。

所以,当然,你必须扫描O(N)的input,但是你只能捕获less量的信息(用k而不是N来定义),然后必须以某种方式findk个缺失的数字。

以下是Dimitris Andreou的链接摘要。

记住第i个幂的和,其中i = 1,2,…,k。 这减less了解决方程组的问题

a 1 + a 2 + … + a k = b 1

a 1 2 + a 2 2 + … + a k 2 = b 2

a 1 k + a 2 k + … + a k k = b k

使用牛顿的身份 ,知道允许计算

c 1 = a 1 + a 2 + … a k

c 2 = a 1 a 2 + a 1 a 3 + … + a k-1 a k

c k = a 1 a 2 … a k

如果展开多项式(xa 1 )…(xa k ),则系数恰好为c 1 ,…,c k – 参见维特的公式 。 由于每个多项式因子都是唯一的(多项式环是一个欧几里德域 ),这意味着一个i是唯一确定的,直到置换。

这结束了记忆能力足以恢复数字的certificate。 对于常数k,这是一个好方法。

然而,当k变化时,计算c 1 ,…,c k的直接方法是非常昂贵的,因为例如c k是所有缺失数量的大小n!/(nk)!的乘积。 为了克服这个问题,在Z q字段中进行计算,其中q是一个素数,使得n <= q <2n – 它由Bertrand的公式存在。 certificate不需要改变,因为公式仍然成立,多项式的因式分解仍然是唯一的。 您还需要有限域上的因式分解algorithm,例如Berlekamp或Cantor-Zassenhaus的algorithm 。

常数k的高级伪代码:

  • 计算给定数字的第i个幂
  • 减去得到未知数的第i次幂。 给打钱。
  • 用牛顿的身份来计算b i的系数; 称他们为 。 基本上,c 1 = b 1 ; c 2 =(c 1 b 1 – b 2 )/ 2; 请参阅维基百科准确的公式
  • 分解多项式x k -c 1 x k-1 + … + c k
  • 多项式的根是所需的数字a 1 ,…,a k

对于变化的k,使用例如Miller-Rabinfind一个素数n <= q <2n,并且执行所有数量减less的模q的步骤。

正如海因里希Apfelmus评论说,而不是一个素数q,你可以使用q = 2loglogn⌉并在有限的领域进行算术 。

你可以通过阅读Muthukrishnan的几页– 数据streamalgorithm:谜题1:寻找缺失的数字来find它。 它显示了你正在寻找的概括 。 或许这就是你面试者阅读的内容,以及他为什么提出这些问题。

现在,如果只有人们会开始删除被Muthukrishnan的处理所包含或取代的答案,并使这个文本更容易find。 🙂


另外请参阅sdcvvc的直接相关的答案 ,其中还包括伪代码(欢呼!不需要阅读那些棘手的math公式:))(谢谢,伟大的工作!)。

我们可以通过对数字本身和数字的平方进行求和来求解Q2。

那么我们可以把问题减less到

 k1 + k2 = x k1^2 + k2^2 = y 

其中xy是总和低于期望值的距离。

替代给我们:

 (x-k2)^2 + k2^2 = y 

然后,我们可以解决,以确定我们缺less的数字。

正如@j_random_hacker所指出的,这与在O(n)时间和O(1)空间中查找重复数据非常相似,我的答案也适用于这里。

假设“bag”由一个大小为N - k的1-based数组A[] N - k ,我们可以在O(N)时间和O(k)附加空间中求解Qk。

首先,我们将数组A[]扩展为k元素,现在它的大小为N 这是O(k)额外的空间。 然后我们运行下面的伪代码algorithm:

 for i := n - k + 1 to n A[i] := A[1] end for for i := 1 to n - k while A[A[i]] != A[i] swap(A[i], A[A[i]]) end while end for for i := 1 to n if A[i] != i then print i end if end for 

第一个循环将k额外的条目初始化为与数组中的第一个条目相同(这只是一个方便的值,我们知道该值已经存在于数组中 – 在此步骤之后,初始数组大小中缺less的任何条目Nk仍然在扩展数组中缺失)。

第二个循环会对扩展数组进行置换,以便如果元素x至less存在一次,那么其中一个将位于A[x]位置。

请注意,尽pipe它有一个嵌套的循环,但它仍然在O(N)时间运行 – 只有当有一个i使得A[i] != i时才会发生交换,并且每个交换至less设置一个元素,使得A[i] == i ,那里以前不是真的。 这意味着交换的总次数(以及因此while循环体的总执行次数)最多为N-1

第三个循环打印未被值i占用的数组i索引 – 这意味着i一定是失踪了。

我问了一个4岁的孩子来解决这个问题。 他整理了数字,然后一起计算。 这有一个O(厨房地板)的空间要求,它的工作就像很容易,但很多球丢失。

我没有检查math,但是我怀疑计算Σ(n^2)与我们计算Σ(n)的同一通道将提供足够的信息来获得两个缺失的数字Σ(n^3)有三个,依此类推。

不知道,如果这是最有效的解决scheme,但我会遍历所有条目,并使用一个bitset记住,哪些数字设置,然后testing0位。

我喜欢简单的解决scheme – 我甚至相信,它可能比计算总和,或平方和等快

基于数字总和的解决scheme存在的问题是,他们没有考虑到存储和使用大数字指数的工作的成本……实际上,如果它适用于非常大的数字,那么将会使用大数字的图书馆。 我们可以分析这些algorithm的空间利用率。

我们可以分析sdcvvc和Dimitris Andreoualgorithm的时间和空间复杂度。

存储:

 l_j = ceil (log_2 (sum_{i=1}^ni^j)) l_j > log_2 n^j (assuming n >= 0, k >= 0) l_j > j log_2 n \in \Omega(j log n) l_j < log_2 ((sum_{i=1}^ni)^j) + 1 l_j < j log_2 (n) + j log_2 (n + 1) - j log_2 (2) + 1 l_j < j log_2 n + j + c \in O(j log n)` 

所以l_j \in \Theta(j log n)

使用的总存储量: \sum_{j=1}^k l_j \in \Theta(k^2 log n)

使用的空间:假devise算a^j需要ceil(log_2 j)时间,总时间:

 t = k ceil(\sum_i=1^n log_2 (i)) = k ceil(log_2 (\prod_i=1^n (i))) t > k log_2 (n^n + O(n^(n-1))) t > k log_2 (n^n) = kn log_2 (n) \in \Omega(kn log n) t < k log_2 (\prod_i=1^ni^i) + 1 t < kn log_2 (n) + 1 \in O(kn log n) 

总使用时间: \Theta(kn log n)

如果这个时间和空间是令人满意的,你可以使用一个简单的recursionalgorithm。 让我成为包中的第i个条目,n是删除前的数字个数,k是删除的个数。 在Haskell语法中…

 let -- O(1) isInRange low high v = (v >= low) && (v <= high) -- O(n - k) countInRange low high = sum $ map (fromEnum . isInRange low high . (!)b) [1..(nk)] findMissing l low high krange -- O(1) if there is nothing to find. | krange=0 = l -- O(1) if there is only one possibility. | low=high = low:l -- Otherwise total of O(knlog(n)) time | otherwise = let mid = (low + high) `div` 2 klow = countInRange low mid khigh = krange - klow in findMissing (findMissing low mid klow) (mid + 1) high khigh in findMising 1 (n - k) k 

使用的存储: O(k)为列表, O(log(n))为堆栈: O(k + log(n))该algorithm更直观,具有相同的时间复杂度,并且占用空间更less。

等一下。 正如问题所述,包里有100个号码。 不pipek有多大,这个问题都可以在一个循环中最多100k次迭代,因为你可以使用一个集合并从集合中删除数字。 100是恒定的。 剩下的一组数字就是你的答案。

如果我们把解推广到从1到N的数字,除了N不是一个常数,没有什么变化,所以我们处于O(N-k)= O(N)的时间。 例如,如果我们使用一个位集,我们在O(N)时间将位设置为1,迭代数字,随着我们去(O(Nk)= O(N)),将位设置为0,然后我们有答案。

在我看来,面试官问你如何在O(k)时间而不是O(N)时间打印最终集合的内容。 显然,用一点点设置,你必须遍历所有N位来确定你是否应该打印数字。 但是,如果您改变设置的实现方式,则可以以k次迭代打印出数字。 这是通过将数字放入要存储在哈希集和双向链表中的对象来完成的。 从哈希集中删除对象时,也可以从列表中删除它。 答案将留在现在长度为k的列表中。

这里有一个解决scheme,使用额外的存储k位,没有任何聪明的技巧,只是简单的。 执行时间O(n),额外空间O(k)。 只是为了certificate这个问题可以在没有先解决问题或者成为天才的情况下解决:

 void puzzle (int* data, int n, bool* extra, int k) { // data contains n distinct numbers from 1 to n + k, extra provides // space for k extra bits. // Rearrange the array so there are (even) even numbers at the start // and (odd) odd numbers at the end. int even = 0, odd = 0; while (even + odd < n) { if (data [even] % 2 == 0) ++even; else if (data [n - 1 - odd] % 2 == 1) ++odd; else { int tmp = data [even]; data [even] = data [n - 1 - odd]; data [n - 1 - odd] = tmp; ++even; ++odd; } } // Erase the lowest bits of all numbers and set the extra bits to 0. for (int i = even; i < n; ++i) data [i] -= 1; for (int i = 0; i < k; ++i) extra [i] = false; // Set a bit for every number that is present for (int i = 0; i < n; ++i) { int tmp = data [i]; tmp -= (tmp % 2); if (i >= odd) ++tmp; if (tmp <= n) data [tmp - 1] += 1; else extra [tmp - n - 1] = true; } // Print out the missing ones for (int i = 1; i <= n; ++i) if (data [i - 1] % 2 == 0) printf ("Number %d is missing\n", i); for (int i = n + 1; i <= n + k; ++i) if (! extra [i - n - 1]) printf ("Number %d is missing\n", i); // Restore the lowest bits again. for (int i = even; i < n; ++i) data [i] += 1; } 

你能检查每个号码是否存在? 如果是的话,你可以试试这个:

S =袋中所有数字的总和(S <5050)
Z =缺失的数字5050-S的总和

如果缺less的数字是xy则:

x = Z – y和
max(x)= Z – 1

所以你检查范围从1max(x)并find数字

要解决2(和3)缺失的数字问题,您可以修改quickselect ,它平均在O(n)运行,并使用常量内存,如果分区是在原地进行的。

  1. 将该集合相对于随机枢轴p分区到包含小于枢轴的数字的分区l和包含大于枢轴的数字的r

  2. 通过比较枢轴值和每个分区的大小来确定哪些分区是2个缺失的数字( p - 1 - count(l) = count of missing numbers in ln - count(r) - p = count of missing numbers in r p - 1 - count(l) = count of missing numbers in l n - count(r) - p = count of missing numbers in r

  3. a)如果每个分区缺less一个数字,则使用求和方法的差异来查找每个缺失的数字。

    (1 + 2 + ... + (p-1)) - sum(l) = missing #1((p+1) + (p+2) ... + n) - sum(r) = missing #2

    b)如果一个分区缺less两个数字,分区是空的,那么缺失的数字是(p-1,p-2)(p+1,p+2)取决于哪个分区缺less数字。

    如果一个分区丢失了2个数字,但是不是空的,则将该分区recursion到该分区。

只有2个丢失的数字,这个algorithm总是丢弃至less一个分区,所以它保留了O(n)快速select的平均时间复杂度。 同样,有3个缺失的数字,这个algorithm也会丢弃至less一个分区(因为缺less2个数字,最多只有1个分区会包含多个缺失的数字)。 但是,我不确定在添加更多缺less的数字时性能会下降多less。

这里有一个使用就地分区的实现,所以这个例子不符合空间要求,但它确实说明了algorithm的步骤:

 <?php $list = range(1,100); unset($list[3]); unset($list[31]); findMissing($list,1,100); function findMissing($list, $min, $max) { if(empty($list)) { print_r(range($min, $max)); return; } $l = $r = []; $pivot = array_pop($list); foreach($list as $number) { if($number < $pivot) { $l[] = $number; } else { $r[] = $number; } } if(count($l) == $pivot - $min - 1) { // only 1 missing number use difference of sums print array_sum(range($min, $pivot-1)) - array_sum($l) . "\n"; } else if(count($l) < $pivot - $min) { // more than 1 missing number, recurse findMissing($l, $min, $pivot-1); } if(count($r) == $max - $pivot - 1) { // only 1 missing number use difference of sums print array_sum(range($pivot + 1, $max)) - array_sum($r) . "\n"; } else if(count($r) < $max - $pivot) { // mroe than 1 missing number recurse findMissing($r, $pivot+1, $max); } } 

演示

如果你有两个清单和两个清单的产品的总和,你可以解决Q2。

(l1是原来的,l2是修改后的列表)

 d = sum(l1) - sum(l2) m = mul(l1) / mul(l2) 

我们可以优化这一点,因为算术系列的总和是第一个和最后一个条件的平均值的n倍:

 n = len(l1) d = (n/2)*(n+1) - sum(l2) 

现在我们知道了(如果a和b是被删除的数字):

 a + b = d a * b = m 

所以我们可以重新排列:

 a = s - b b * (s - b) = m 

并乘以:

 -b^2 + s*b = m 

And rearrange so the right side is zero:

 -b^2 + s*b - m = 0 

Then we can solve with the quadratic formula:

 b = (-s + sqrt(s^2 - (4*-1*-m)))/-2 a = s - b 

Sample Python 3 code:

 from functools import reduce import operator import math x = list(range(1,21)) sx = (len(x)/2)*(len(x)+1) x.remove(15) x.remove(5) mul = lambda l: reduce(operator.mul,l) s = sx - sum(x) m = mul(range(1,21)) / mul(x) b = (-s + math.sqrt(s**2 - (-4*(-m))))/-2 a = s - b print(a,b) #15,5 

I do not know the complexity of the sqrt, reduce and sum functions so I cannot work out the complexity of this solution (if anyone does know please comment below.)

I think this can be done without any complex mathematical equations and theories. Below is a proposal for an in place and O(2n) time complexity solution:

Input form assumptions :

# of numbers in bag = n

# of missing numbers = k

The numbers in the bag are represented by an array of length n

Length of input array for the algo = n

Missing entries in the array (numbers taken out of the bag) are replaced by the value of the first element in the array.

例如。 Initially bag looks like [2,9,3,7,8,6,4,5,1,10]. If 4 is taken out, value of 4 will become 2 (the first element of the array). Therefore after taking 4 out the bag will look like [2,9,3,7,8,6,2,5,1,10]

The key to this solution is to tag the INDEX of a visited number by negating the value at that INDEX as the array is traversed.

  IEnumerable<int> GetMissingNumbers(int[] arrayOfNumbers) { List<int> missingNumbers = new List<int>(); int arrayLength = arrayOfNumbers.Length; //First Pass for (int i = 0; i < arrayLength; i++) { int index = Math.Abs(arrayOfNumbers[i]) - 1; if (index > -1) { arrayOfNumbers[index] = Math.Abs(arrayOfNumbers[index]) * -1; //Marking the visited indexes } } //Second Pass to get missing numbers for (int i = 0; i < arrayLength; i++) { //If this index is unvisited, means this is a missing number if (arrayOfNumbers[i] > 0) { missingNumbers.Add(i + 1); } } return missingNumbers; } 

This might sound stupid, but, in the first problem presented to you, you would have to see all the remaining numbers in the bag to actually add them up to find the missing number using that equation.

So, since you get to see all the numbers, just look for the number that's missing. The same goes for when two numbers are missing. Pretty simple I think. No point in using an equation when you get to see the numbers remaining in the bag.

May be this algorithm can work for question 1:

  1. Precompute xor of first 100 integers(val=1^2^3^4….100)
  2. xor the elements as they keep coming from input stream ( val1=val1^next_input)
  3. final answer=val^val1

Or even better:

 def GetValue(A) for i=1 to 100 do val=val^i done for value in A: do val=val^value done return val 

This algorithm can in fact be expanded for two missing numbers. The first step remains the same. When we call GetValue with two missing numbers the result will be a a1^a2 are the two missing numbers. Lets say

val = a1^a2

Now to sieve out a1 and a2 from val we take any set bit in val. Lets say the ith bit is set in val. That means that a1 and a2 have different parity at ith bit position. Now we do another iteration on the original array and keep two xor values. One for the numbers which have the ith bit set and other which doesn't have the ith bit set. We now have two buckets of numbers, and its guranteed that a1 and a2 will lie in different buckets. Now repeat the same what we did for finding one missing element on each of the bucket.

Very nice problem. I'd go for using a set difference for Qk. A lot of programming languages even have support for it, like in Ruby:

 missing = (1..100).to_a - bag 

It's probably not the most efficient solution but it's one I would use in real life if I was faced with such a task in this case (known boundaries, low boundaries). If the set of number would be very large then I would consider a more efficient algorithm, of course, but until then the simple solution would be enough for me.

You could try using a Bloom Filter . Insert each number in the bag into the bloom, then iterate over the complete 1-k set until reporting each one not found. This may not find the answer in all scenarios, but might be a good enough solution.

I'd take a different approach to that question and probe the interviewer for more details about the larger problem he's trying to solve. Depending on the problem and the requirements surrounding it, the obvious set-based solution might be the right thing and the generate-a-list-and-pick-through-it-afterward approach might not.

For example, it might be that the interviewer is going to dispatch n messages and needs to know the k that didn't result in a reply and needs to know it in as little wall clock time as possible after the nk th reply arrives. Let's also say that the message channel's nature is such that even running at full bore, there's enough time to do some processing between messages without having any impact on how long it takes to produce the end result after the last reply arrives. That time can be put to use inserting some identifying facet of each sent message into a set and deleting it as each corresponding reply arrives. Once the last reply has arrived, the only thing to be done is to remove its identifier from the set, which in typical implementations takes O(log k+1) . After that, the set contains the list of k missing elements and there's no additional processing to be done.

This certainly isn't the fastest approach for batch processing pre-generated bags of numbers because the whole thing runs O((log 1 + log 2 + ... + log n) + (log n + log n-1 + ... + log k)) . But it does work for any value of k (even if it's not known ahead of time) and in the example above it was applied in a way that minimizes the most critical interval.

I don't know whether this is efficient or not but I would like to suggest this solution.

  1. Compute xor of the 100 elements
  2. Compute xor of the 98 elements (after the 2 elements are removed)
  3. Now (result of 1) XOR (result of 2) gives you the xor of the two missing nos i..ea XOR b if a and b are the missing elements
    4.Get the sum of the missing Nos with your usual approach of the sum formula diff and lets say the diff is d.

Now run a loop to get the possible pairs (p,q) both of which lies in [1 , 100] and sum to d.

When a pair is obtained check whether (result of 3) XOR p = q and if yes we are done.

Please correct me if I am wrong and also comment on time complexity if this is correct

A very simple way to do it in roughly O(N) time is to remove each element when seen in both lists. This works for unsorted lists too and can be easily further optimized if the lists are both sorted.

 import random K = 2 missingNums = range(0, 101) incompleteList = range(0, 101) #Remove K numbers for i in range(K): valueToRemove = random.choice(incompleteList) incompleteList.remove(valueToRemove) dummyVariable = [missingNums.remove(num) for num in p if num in missingNums] print missingNums 

There is a general way to generalize streaming algorithms like this. The idea is to use a bit of randomization to hopefully 'spread' the k elements into independent sub problems, where our original algorithm solves the problem for us. This technique is used in sparse signal reconstruction, among other things.

  • Make an array, a , of size u = k^2 .
  • Pick any universal hash function , h : {1,...,n} -> {1,...,u} . (Like multiply-shift )
  • For each i in 1, ..., n increase a[h(i)] += i
  • For each number x in the input stream, decrement a[h(x)] -= x .

If all of the missing numbers have been hashed to different buckets, the non-zero elements of the array will now contain the missing numbers.

The probability that a particular pair is sent to the same bucket, is less than 1/u by definition of a universal hash function. Since there are about k^2/2 pairs, we have that the error probability is at most k^2/2/u=1/2 . That is, we succeed with probability at least 50%, and if we increase u we increase our chances.

Notice that this algorithm takes k^2 logn bits of space (We need logn bits per array bucket.) This matches the space bound from @Dimitris Andreou's answers, which happens to also be randomized. This algorithm also has constant time per update, rather than time k in the case of power-sums.

I'd be surprised if there is a more efficient algorithm than the above. In theory (the space bound) as well as in practice (the actual running time).

For Q2 this is a solution that is a bit more inefficient than the others, but still has O(N) runtime and takes O(k) space.

The idea is to run the original algorithm two times. In the first one you get a total number which is missing, which gives you an upper bound of the missing numbers. Let's call this number N . You know that the missing two numbers are going to sum up to N , so the first number can only be in the interval [1, floor((N-1)/2)] while the second is going to be in [floor(N/2)+1,N-1] .

Thus you loop on all numbers once again, discarding all numbers that are not included in the first interval. The ones that are, you keep track of their sum. Finally, you'll know one of the missing two numbers, and by extension the second.

I have a feeling that this method could be generalized and maybe multiple searches run in "parallel" during a single pass over the input, but I haven't yet figured out how.

You'd probably need clarification on what O(k) means.

Here's a trivial solution for arbitrary k: for each v in your set of numbers, accumulate the sum of 2^v. At the end, loop i from 1 to N. If sum modulo 2^i is zero, then i is missing.

很简单,对吧? O(N) time, O(1) storage, and it supports arbitrary k.

Except that you're computing enormous numbers that on a real computer would each require O(N) space. In fact, this solution is identical to a bit vector.

So you could be clever and compute the sum and the sum of squares and the sum of cubes… up to the sum of v^k, and do the fancy math to extract the result. But those are big numbers too, which begs the question: what abstract model of operation are we talking about? How much fits in O(1) space, and how long does it take to sum up numbers of whatever size you need?

I have an idea, but this is assuming that the actual size of the array is 100 and the missing numbers are replaced with something else (like -1).

Basically, do a sort that's kind of a modified version of a selection sort that swaps the list in-place. I believe this is O(n) time (correct me if I'm wrong though) because we make use of the fact we already know the numbers that should exist. We swap the value with the "correct" position, until the index we are at has the correct number (or has -1).

After we're done with that, we just loop the list again and the index will basically be the missing numbers

 #Set up the input input = (1..100).to_a.shuffle input[rand(0..99)] = -1 input[rand(0..99)] = -1 def f(a) for i in 0..99 while (a[i] != i+1) && a[i] > 0 v1 = a[i] v2 = a[v1 - 1] a[i] = v2 a[v1 - 1] = v1 end end result = [] a.each_with_index do |value, index| if value < 0 result.push(index + 1) end end return result end #Returns the 2 missing numbers puts f(input) 

I believe I have a O(k) time and O(log(k)) space algorithm, given that you have the floor(x) and log2(x) functions for arbitrarily big integers available:

You have an k -bit long integer (hence the log8(k) space) where you add the x^2 , where x is the next number you find in the bag: s=1^2+2^2+... This takes O(N) time (which is not a problem for the interviewer). At the end you get j=floor(log2(s)) which is the biggest number you're looking for. Then s=sj and you do again the above:

 for (i = 0 ; i < k ; i++) { j = floor(log2(s)); missing[i] = j; s -= j; } 

Now, you usually don't have floor and log2 functions for 2756 -bit integers but instead for doubles. So? Simply, for each 2 bytes (or 1, or 3, or 4) you can use these functions to get the desired numbers, but this adds an O(N) factor to time complexity

Try to find the product of numbers from 1 to 50:

Let product, P1 = 1 x 2 x 3 x …………. 50

When you take out numbers one by one, multiply them so that you get the product P2. But two numbers are missing here, hence P2 < P1.

The product of the two mising terms, axb = P1 – P2.

You already know the sum, a + b = S1.

From the above two equations, solve for a and b through a quadratic equation. a and b are your missing numbers.

I think this can be generalized like this:

Denote S, M as the initial values for the sum of arithmetic series and multiplication.

 S = 1 + 2 + 3 + 4 + ... n=(n+1)*n/2 M = 1 * 2 * 3 * 4 * .... * n 

I should think about a formula to calculate this, but that is not the point. Anyway, if one number is missing, you already provided the solution. However, if two numbers are missing then, let's denote the new sum and total multiple by S1 and M1, which will be as follows:

 S1 = S - (a + b)....................(1) Where a and b are the missing numbers. M1 = M - (a * b)....................(2) 

Since you know S1, M1, M and S, the above equation is solvable to find a and b, the missing numbers.

Now for the three numbers missing:

 S2 = S - ( a + b + c)....................(1) Where a and b are the missing numbers. M2 = M - (a * b * c)....................(2) 

Now your unknown is 3 while you just have two equations you can solve from.

The key is to use indexes to mark if a number is present or not in the range. Here we know that we have 1 to N. Time complexity O(n) Space complexity O(1)

Followup questions: This may be modified to find if an element is missing from an AP of difference d. Other variation may include find first missing +ve number from any random array containing -ve number as well. Then first partition around 0 quick sort , then do this procedure on right side of partition part of the array, do necessary modification.

 public static void missing(int [] arr){ for(int i=0; i< arr.length; i++){ if(arr[i]!=-1 && arr[i]<=arr.length){ int idx=i; while(idx>=0 && idx<arr.length&& arr[idx]!=-1 ){ int temp =arr[idx]; // temp-1 because array index starts from 0, ie a[0]=-1 is indicates that 1 is present in the array arr[temp-1]=-1; idx=temp-1; } } } } 

After this we need to iterate over array, and check if a[i]!=-1, then i+1 is the missing number. We have to be careful when a[i]>N.

You can motivate the solution by thinking about it in terms of symmetries (groups, in math language). No matter the order of the set of numbers, the answer should be the same. If you're going to use k functions to help determine the missing elements, you should be thinking about what functions have that property: symmetric. The function s_1(x) = x_1 + x_2 + ... + x_n is an example of a symmetric function, but there are others of higher degree. In particular, consider the elementary symmetric functions . The elementary symmetric function of degree 2 is s_2(x) = x_1 x_2 + x_1 x_3 + ... + x_1 x_n + x_2 x_3 + ... + x_(n-1) x_n , the sum of all products of two elements. Similarly for the elementary symmetric functions of degree 3 and higher. They are obviously symmetric. Furthermore, it turns out they are the building blocks for all symmetric functions.

You can build the elementary symmetric functions as you go by noting that s_2(x,x_(n+1)) = s_2(x) + s_1(x)(x_(n+1)) . Further thought should convince you that s_3(x,x_(n+1)) = s_3(x) + s_2(x)(x_(n+1)) and so on, so they can be computed in one pass.

How do we tell which items were missing from the array? Think about the polynomial (z-x_1)(z-x_2)...(z-x_n) . It evaluates to 0 if you put in any of the numbers x_i . Expanding the polynomial, you get z^n-s_1(x)z^(n-1)+ ... + (-1)^n s_n . The elementary symmetric functions appear here too, which is really no surprise, since the polynomial should stay the same if we apply any permutation to the roots.

So we can build the polynomial and try to factor it to figure out which numbers are not in the set, as others have mentioned.

Finally, if we are concerned about overflowing memory with large numbers (the nth symmetric polynomial will be of the order 100! ), we can do these calculations mod p where p is a prime bigger than 100. In that case we evaluate the polynomial mod p and find that it again evaluates to 0 when the input is a number in the set, and it evaluates to a non-zero value when the input is a number not in the set. However, as others have pointed out, to get the values out of the polynomial in time that depends on k , not N , we have to factor the polynomial mod p .