Finding Missing Number(s)

“A question not simple enough to answer and answer not simple enough to understand”

problem statement-

We have a bag containing numbers 1, 2, 3, …, 100. Each number appears exactly once, so there are 100 numbers. Now few numbers is randomly picked out of the bag. Find the missing numbers.

1. Solution for ‘one’ number missing from the baG


If S_1  is the sum of n natural numbers, then S_1  can be calculated from the following formula

S_1 = \dfrac{n(n+1)}{2}

Validity of this formula can be checked with the help of Principle of Mathematical Induction.

S_1 is what we expect the bag must be having if all the number were present in the bag. However, one number is missing from the bag and our task is to find that number. We can compute the sum of all numbers currently present in the bag iteratively, say S_2. Difference between S_1 and S_2 is the number missing from the bag.

pseudocode
1. S = (n*(n+1))/2 
2. numSum = 0 
3. for i = 1 to n 
4.   numSum = numSum + A[i] 
5. return (S-numSum)

Run time complexity of above algorithm in worst-case is \mathcal{O}(n) , which is fairly good. However, one may decide to optimize this algorithm in case of large evil value of n.

2. SOLUTION FOR ‘Two’ NUMBERs MISSING FROM THE BAG


The above solution works well only if, one number is missing from the list of available numbers. So there may a question that what will be the result produced from the algorithm proposed above when applied in this case? And the answer is sum of two missing numbers. And that is quite obvious to understand from what the above algorithm is trying to calculate. Let us assume that the two missing numbers are a_1 and a_2 .

b_1 = a_1+a_2                                                                    (1)

eq(1) can be calculated from the algorithm proposed above.

Since there are two unknown variables in eq(1) we need another equation to find missing numbers. Let us say that b_2 is the sum of square of two missing numbers.

b_2 = a_1^2+a_2^2                                                                    (2)

And if S_2 is the sum of square of n natural numbers then,

S_2 = \dfrac{n(n+1)(2n+1)}{6}                                                         (3)

Following to this, we calculate the sum of square of given numbers iteratively in a similar manner we calculated the sum of given numbers in previous case discussed. Difference between S_2 and the calculated sum of squares is equivalent to b_2. Thus,

b_2 = S_2 - \sum_{i = 0}^{n}a_i                                                             (4)

Solving the two equation (1) and (2) for a_1 and a_2 is our final step towards the solution of the problem. However, this is easier to say than to be done. There do exist different algorithm for solving such equation, you can search for them. Wolfram alpha solves such equation, but what I am not sure about is; their complexity; which will increase as the number of equation increases.

There is a better way of dealing with this problem and I won’t leave that for next time. We will discuss about it in a moment. Till then we can say that solution of the two equations above is the our expected and required answer.

3. SOLUTION FOR ‘k’ NUMBERS MISSING FROM THE BAG


All we now need to do is, generalize our discussion up until now. We need ‘k’ number of equations of missing numbers, which we can solve and find those missing numbers. Again these different equations will be sum of different powers of missing numbers. Since, we need ‘k’ different equation, powers of missing number will range from i = 1..k

b_1 = a_1+a_2+a_3+...+a_n

b_2 = a_1^2+a_2^+a_3^2+...+a_n^2

.

.

b_i = a_1^i+a_2^i+a_3^i+..+a_n^3

Calculating the value of b_i is entirely similar to what we did in our previous section. Let us assume that we have already computed the the sum of ith powers of n natural number (say S_i). And what I’m assuming is not hypothetical and would take at most \mathcal{O}(n). Next thing we need to compute is the sum of ith power of given numbers. Subtracting the two; we get values of b_i where i = 1..k. Good until now.

Now comes the challenging part. At exactly this point I concluded in previous section that the solution of the equations is our required answer. Which is also true in this case. But now, with so many equation in hand entire algorithm might seems an impractical attempt. But, we do have a comparatively easier way of solving these equation. And at this point you will find Newton’s Identities helpful. In case you are not familiar with the concept of Newton’s Identities, please follow the link provided which have a wonderful explanation of Newton’s Identities.

There is something important to explain here. Look at this polynomial equation here –

f(x) = (x-a_1)(x-a_2)(x-a_3)...(x-a_k)                                         (3)

f(x) = x^k-c_1x^k-1+c_2x^k-2+...+c_k                                          (4)

Coefficients c_1,c_2,c_3...c_k  can be calculated with the help of Newton’s Identities as follows

c_1 = b_1

c_2 = \dfrac{b_2-c_1b_1}{-2}

c_3 = \dfrac{b_3-c_1b_2+c_2b_1}{3}

c_4 = \dfrac{b_4-c_1b_3+c_2b_2-c_3b_1}{-4}

.

.

c_k = \dfrac{b_k-c_1b_k-1+c_2b_k-2-c_3b_k-3+...+(-1)^{k-1}c_{k-1}b_1}{(-1)^{k-1}k}

c_k = \dfrac{b_k-\sum_{i=1}^{k-1}(-1)^{i-1}c_ib_{k-i}}{(-1)^{k-1}k}

The values computed can be substituted in eq(4) . And our last step is to factorize eq(4) to get equation of the form eq(3) .

Its easier to solve eq(4) than k different equation of different power. If you are looking how, here are few helpful resource links

PSEUDOCODE
1. Calculate sum of ith power of n natural numbers
2. Calculate sum of ith power of given numbers
3. Subtract the result from above two steps to get 
   sum of ith power of missing numbers, say bi. 
   Use generalized formula(from Newton's Identities)
   proposed above to compute coefficients of eq(4).
5. Factor eq(4) after substituting coefficient get polynomial 
   equation of the form eq(3).
6. Roots of eq(4) are the missing numbers.

conclusion

This solution has no practicality, but since we were concerned about time as well as space complexity, we have a good approach to restrict complexity to k not N. Remembering the powers is enough to find the solution of above stated problem. How many powers? Well obviously, k – count of number of elements missing from the bag.

There is another approach to this problem, much more practical than this one but at the cost of complexity. What we can do is sort the given numbers in increasing order (\mathcal{O}(nlogn) ). And then count along to find those numbers with difference more than one from the previous one [read this]. This is definitely a solution on which everyone can agree. However, the solution presented above becomes more important and and more right when someone explicitly mentions to consider complexities in terms of k not N.


references
  1. Muthukrishnan – Data Stream Algorithms Puzzle 1 Finding Missing Number
  2. Brilliant: Newton’s Identities
  3. sdcvvc’s Answer on Stack Overflow
Finding Missing Number(s)