August 2015 – Prateek Dwivedi

“A question not simple enough to answer and answer not simple enough to understand”

problem statement-

We have a bag containing numbers 1, 2, 3, …, 100. Each number appears exactly once, so there are 100 numbers. Now few numbers is randomly picked out of the bag. Find the missing numbers.

1. Solution for ‘one’ number missing from the baG

If $S_1$ is the sum of n natural numbers, then $S_1$ can be calculated from the following formula

$S_1 = \dfrac{n(n+1)}{2}$

Validity of this formula can be checked with the help of Principle of Mathematical Induction.

$S_1$ is what we expect the bag must be having if all the number were present in the bag. However, one number is missing from the bag and our task is to find that number. We can compute the sum of all numbers currently present in the bag iteratively, say $S_2$ . Difference between $S_1$ and $S_2$ is the number missing from the bag.

pseudocode

1. S = (n*(n+1))/2 
2. numSum = 0 
3. for i = 1 to n 
4.   numSum = numSum + A[i] 
5. return (S-numSum)

Run time complexity of above algorithm in worst-case is $\mathcal{O}(n)$ , which is fairly good. However, one may decide to optimize this algorithm in case of large evil value of n.

2. SOLUTION FOR ‘Two’ NUMBERs MISSING FROM THE BAG

The above solution works well only if, one number is missing from the list of available numbers. So there may a question that what will be the result produced from the algorithm proposed above when applied in this case? And the answer is sum of two missing numbers. And that is quite obvious to understand from what the above algorithm is trying to calculate. Let us assume that the two missing numbers are $a_1$ and $a_2$ .

$b_1 = a_1+a_2$ $(1)$

$eq(1)$ can be calculated from the algorithm proposed above.

Since there are two unknown variables in $eq(1)$ we need another equation to find missing numbers. Let us say that $b_2$ is the sum of square of two missing numbers.

$b_2 = a_1^2+a_2^2$ $(2)$

And if $S_2$ is the sum of square of n natural numbers then,

$S_2 = \dfrac{n(n+1)(2n+1)}{6}$ $(3)$

Following to this, we calculate the sum of square of given numbers iteratively in a similar manner we calculated the sum of given numbers in previous case discussed. Difference between $S_2$ and the calculated sum of squares is equivalent to $b_2$ . Thus,

$b_2 = S_2 - \sum_{i = 0}^{n}a_i$ $(4)$

Solving the two equation $(1)$ and $(2)$ for $a_1$ and $a_2$ is our final step towards the solution of the problem. However, this is easier to say than to be done. There do exist different algorithm for solving such equation, you can search for them. Wolfram alpha solves such equation, but what I am not sure about is; their complexity; which will increase as the number of equation increases.

There is a better way of dealing with this problem and I won’t leave that for next time. We will discuss about it in a moment. Till then we can say that solution of the two equations above is the our expected and required answer.

3. SOLUTION FOR ‘k’ NUMBERS MISSING FROM THE BAG

All we now need to do is, generalize our discussion up until now. We need ‘k’ number of equations of missing numbers, which we can solve and find those missing numbers. Again these different equations will be sum of different powers of missing numbers. Since, we need ‘k’ different equation, powers of missing number will range from $i = 1..k$

$b_1 = a_1+a_2+a_3+...+a_n$

$b_2 = a_1^2+a_2^+a_3^2+...+a_n^2$

$b_i = a_1^i+a_2^i+a_3^i+..+a_n^3$

Calculating the value of $b_i$ is entirely similar to what we did in our previous section. Let us assume that we have already computed the the sum of ith powers of n natural number (say $S_i$ ). And what I’m assuming is not hypothetical and would take at most $\mathcal{O}(n)$ . Next thing we need to compute is the sum of ith power of given numbers. Subtracting the two; we get values of $b_i$ where $i = 1..k$ . Good until now.

Now comes the challenging part. At exactly this point I concluded in previous section that the solution of the equations is our required answer. Which is also true in this case. But now, with so many equation in hand entire algorithm might seems an impractical attempt. But, we do have a comparatively easier way of solving these equation. And at this point you will find Newton’s Identities helpful. In case you are not familiar with the concept of Newton’s Identities, please follow the link provided which have a wonderful explanation of Newton’s Identities.

There is something important to explain here. Look at this polynomial equation here –

$f(x) = (x-a_1)(x-a_2)(x-a_3)...(x-a_k)$ $(3)$

$f(x) = x^k-c_1x^k-1+c_2x^k-2+...+c_k$ $(4)$

Coefficients $c_1,c_2,c_3...c_k$ can be calculated with the help of Newton’s Identities as follows

$c_1 = b_1$

$c_2 = \dfrac{b_2-c_1b_1}{-2}$

$c_3 = \dfrac{b_3-c_1b_2+c_2b_1}{3}$

$c_4 = \dfrac{b_4-c_1b_3+c_2b_2-c_3b_1}{-4}$

$c_k = \dfrac{b_k-c_1b_k-1+c_2b_k-2-c_3b_k-3+...+(-1)^{k-1}c_{k-1}b_1}{(-1)^{k-1}k}$

$c_k = \dfrac{b_k-\sum_{i=1}^{k-1}(-1)^{i-1}c_ib_{k-i}}{(-1)^{k-1}k}$

The values computed can be substituted in $eq(4)$ . And our last step is to factorize $eq(4)$ to get equation of the form $eq(3)$ .

Its easier to solve $eq(4)$ than $k$ different equation of different power. If you are looking how, here are few helpful resource links

PSEUDOCODE

1. Calculate sum of ith power of n natural numbers
2. Calculate sum of ith power of given numbers
3. Subtract the result from above two steps to get 
   sum of ith power of missing numbers, say bi. 
   Use generalized formula(from Newton's Identities)
   proposed above to compute coefficients of eq(4).
5. Factor eq(4) after substituting coefficient get polynomial 
   equation of the form eq(3).
6. Roots of eq(4) are the missing numbers.

conclusion

This solution has no practicality, but since we were concerned about time as well as space complexity, we have a good approach to restrict complexity to k not N. Remembering the powers is enough to find the solution of above stated problem. How many powers? Well obviously, k – count of number of elements missing from the bag.

There is another approach to this problem, much more practical than this one but at the cost of complexity. What we can do is sort the given numbers in increasing order ( $\mathcal{O}(nlogn)$ ). And then count along to find those numbers with difference more than one from the previous one [read this]. This is definitely a solution on which everyone can agree. However, the solution presented above becomes more important and and more right when someone explicitly mentions to consider complexities in terms of k not N.

Prateek Dwivedi

Menu

Month: August 2015

Finding Missing Number(s)