Finding missing number using binary search

Finding missing number using binary search - c++

I am reading book on programming pearls.
Question: Given a sequential file that contains at most four billion
32 bit integers in random order, find a 32-bit integer that isn't in
the file (and there must be at least one missing). This problem has to
be solved if we have a few hundred bytes of main memory and several
sequential files.
Solution: To set this up as a binary search we have to define a range,
a representation for the elements within the range, and a probing
method to determine which half of a range holds the missing integer.
How do we do this?
We'll use as the range a sequence of integers known to contain atleast
one missing element, and we'll represent the range by a file
containing all the integers in it. The insight is that we can probe a
range by counting the elements above and below its midpoint: either
the upper or the lower range has atmost half elements in the total
range. Because the total range has a missing element, the smaller half
must also have a mising element. These are most ingredients of a
binary search algorithm for above problem.
Above text is copy right of Jon Bently from programming pearls book.
Some info is provided at following link
"Programming Pearls" binary search help
How do we search by passes using binary search and also not followed with the example given in above link? Please help me understand logic with just 5 integers rather than million integers to understand logic.

Why don't you re-read the answer in the post "Programming Pearls" binary search help. It explains the process on 5 integers as you ask.
The idea is that you parse each list and break it into 2 (this is where binary part comes from) separate lists based on the value in the first bit.
I.e. showing binary representation of actual numbers
Original List "": 001, 010, 110, 000, 100, 011, 101 => (broken into)
(we remove the first bit and append it to the "name" of the new list)
To form each of the bellow lists we took values starting with [0 or 1] from the list above
List "0": 01, 10, 00, 11 (is formed from subset 001, 010, 000, 011 of List "" by removing the first bit and appending it to the "name" of the new list)
List "1": 10, 00, 01 (is formed from subset 110, 100, 101 of List "" by removing the first bit and appending it to the "name" of the new list)
Now take one of the resulting lists in turn and repeat the process:
List "0" becomes your original list and you break it into
List "0***0**" and
List "0***1**" (the bold numbers are again the 1 [remaining] bit of the numbers in the list being broken)
Carry on until you end up with the empty list(s).
EDIT
Process step by step:
List "": 001, 010, 110, 000, 100, 011, 101 =>
List "0": 01, 10, 00, 11 (from subset 001, 010, 000, 011 of the List "") =>
List "00": 1, 0 (from subset 01, 00 of the List "0") =>
List "000": 0 [final result] (from subset 0 of the List "00")
List "001": 1 [final result] (from subset 1 of the List "00")
List "01": 0, 1 (from subset 10, 11 of the List "0") =>
List "010": 0 [final result] (from subset 0 of the List "01")
List "011": 1 [final result] (from subset 1 of the List "01")
List "1": 10, 00, 01 (from subset 110, 100, 101 of the List "") =>
List "10": 0, 1 (from subset 00, 01 of the List "1") =>
List "100": 0 [final result] (from subset 0 of the List "10")
List "101": 1 [final result] (from subset 1 of the List "10")
List "11": 0 (from subset 10 of the List "1") =>
List "110": 0 [final result] (from subset 0 of the List "11")
List "111": absent [final result] (from subset EMPTY of the List "11")
The positive of this method is that it will allow you to find ANY number of missing numbers in the set - i.e. if more than one is missing.
P.S. AFAIR for 1 single missing number out of the complete range there is even more elegant solution of XOR all numbers.

The idea is to solve easier problem:
Is the missing value in range [minVal, X] or (X, maxVal).
If you know this, you can move X and check again.
For example, you have 3, 4, 1, 5 (2 is missing).
You know that minVal = 1, maxVal = 5.
Range = [1, 5], X = 3, there should be 3 integers in range [1, 3] and 2 in range [4, 5]. There are only 2 in range [1, 3], so you are looking in range [1, 3]
Range = [1, 3], X = 2. There are only 1 value in range [1, 2], so you are looking in range [1, 2]
Range = [1, 2], X = 1. There are no values in range [2, 2] so it is your answer.
EDIT: Some pseudo-C++ code:
minVal = 1, maxVal = 5; //choose correct values
while(minVal < maxVal){
int X = (minVal + maxVal) / 2
int leftNumber = how much in range [minVal, X]
int rightNumber = how much in range [X + 1, maxVal]
if(leftNumber < (X - minVal + 1))maxVal = X
else minVal = X + 1
}

Here's a simple C solution which should illustrate the technique. To abstract away any tedious file I/O details, I'm assuming the existence of the following three functions:
unsigned long next_number (void) reads a number from the file and returns it. When called again, the next number in the file is returned, and so on. Behavior when the end of file is encountered is undefined.
int numbers_left (void) returns a true value if there are more numbers available to be read using next_number(), false if the end of the file has been reached.
void return_to_start (void) rewinds the reading position to the start of the file, so that the next call to next_number() returns the first number in the file.
I'm also assuming that unsigned long is at least 32 bits wide, as required for conforming ANSI C implementations; modern C programmers may prefer to use uint32_t from stdint.h instead.
Given these assumptions, here's the solution:
unsigned long count_numbers_in_range (unsigned long min, unsigned long max) {
unsigned long count = 0;
return_to_start();
while ( numbers_left() ) {
unsigned long num = next_number();
if ( num >= min && num <= max ) {
count++;
}
}
return count;
}
unsigned long find_missing_number (void) {
unsigned long min = 0, max = 0xFFFFFFFF;
while ( min < max ) {
unsigned long midpoint = min + (max - min) / 2;
unsigned long count = count_numbers_in_range( min, midpoint );
if ( count < midpoint - min + 1 ) {
max = midpoint; // at least one missing number below midpoint
} else {
min = midpoint; // no missing numbers below midpoint, must be above
}
}
return min;
}
One detail to note is that min + (max - min) / 2 is the safe way to calculate the average of min and max; it won't produce bogus results due to overflowing intermediate values like the seemingly simpler (min + max) / 2 might.
Also, even though it would be tempting to solve this problem using recursion, I chose an iterative solution instead for two reasons: first, because it (arguably) shows more clearly what's actually being done, and second, because the task was to minimize memory use, which presumably includes the stack too.
Finally, it would be easy to optimize this code, e.g. by returning as soon as count equals zero, by counting the numbers in both halves of the range in one pass and choosing the one with more missing numbers, or even by extending the binary search to n-ary search for some n > 2 to reduce the number of passes. However, to keep the example code as simple as possible, I've left such optimizations unmade. If you like, you may want to, say, try modifying the code so that it requires at most eight passes over the file instead of the current 32. (Hint: use a 16-element array.)

Actually, if we have range of integers from a to b. Sample: [a..b].
And in this range we have b-a integers. It means, that only one is missing.
And if only one is missing, we can calculate result using only single cycle.
First we can calculate sum of all integers in range [a..b], which equals:
sum = (a + b) * (b - a + 1) / 2
Then we calcualate summ of all integers in our sequence:
long sum1 = 0;
for (int i = 0; i < b - a; i++)
sum1 += arr[i];
Then we can find missing element as difference of those two sums:
long result = sum1 - sum;

when you've seen 2^31 zeros or ones in the ith digit place then your answer has a one or zero in the ith place. (Ex: 2^31 ones in 5th binary position means the answer has a zero in the 5th binary position.
First draft of c code:
uint32_t binaryHistogram[32], *list4BILLION, answer, placesChecked[32];
uint64_t limit = 4294967296;
uint32_t halfLimit = 4294967296/2;
int i, j, done
//General method to point to list since this detail is not important to the question.
list4BILLION = 0000000000h;
//Initialize array to zero. This array represents the number of 1s seen as you parse through the list
for(i=0;i<limit;i++)
{
binaryHistogram[i] = 0;
}
//Only sum up for first half of the 4 billion numbers
for(i=0;i<halfLimit;i++)
{
for(j=0;j<32;j++)
{
binaryHistogram[j] += ((*list4BILLION) >> j);
}
}
//Check each ith digit to see if all halfLimit values have been parsed
for(i=halfLimit;i<limit;i++)
{
for(j=0;j<32;j++)
{
done = 1; //Dont need to continue to the end if placesChecked are all
if(placesChecked[j] != 0) //Dont need to pass through the whole list
{
done = 0; //
binaryHistogram[j] += ((*list4BILLION) >> j);
if((binaryHistogram[j] > halfLimit)||(i - binaryHistogram[j] == halfLimit))
{
answer += (1 << j);
placesChecked[j] = 1;
}
}
}
}

Related

C++ lambda expression for sort follows relative ordering

I solved a leetcode problem 2191. Sort the Jumbled Numbers. My doubt is why my code is able to maintain the relative ordering in sorting when mapped values of the elements are equal. Here is the Leetcode question description:
You are given a 0-indexed integer array mapping which represents the mapping rule of a shuffled decimal system. mapping[i] = j means digit i should be mapped to digit j in this system.
The mapped value of an integer is the new integer obtained by replacing each occurrence of digit i in the integer with mapping[i] for all 0 <= i <= 9.
You are also given another integer array nums. Return the array nums sorted in non-decreasing order based on the mapped values of its elements.
Notes:
Elements with the same mapped values should appear in the same relative order as in the input.
The elements of nums should only be sorted based on their mapped values and not be replaced by them.
Example 1
Input: mapping = [8,9,4,0,2,1,3,5,7,6], nums = [991,338,38]
Output: [338,38,991]
Explanation:
Map the number 991 as follows:
1. mapping[9] = 6, so all occurrences of the digit 9 will become 6.
2. mapping[1] = 9, so all occurrences of the digit 1 will become 9.
Therefore, the mapped value of 991 is 669.
338 maps to 007, or 7 after removing the leading zeros.
38 maps to 07, which is also 7 after removing leading zeros.
Since 338 and 38 share the same mapped value, they should remain in the same relative order, so 338 comes before 38.
Thus, the sorted array is [338,38,991].
Example 2
Input: mapping = [0,1,2,3,4,5,6,7,8,9], nums = [789,456,123]
Output: [123,456,789]
Explanation: 789 maps to 789, 456 maps to 456, and 123 maps to 123. Thus, the sorted array is [123,456,789].
My Accepted C++ Code
class Solution {
public:
// function for converting a given number n according to mapped values
int calculate(int n, vector<int>& map){
int ans = 0, mul = 1;
if(n == 0)
return map[0];
while(n != 0){
int digit = n % 10;
ans += map[digit] * mul;
mul *= 10;
n /= 10;
}
return ans;
}
vector<int> sortJumbled(vector<int>& mapping, vector<int>& nums) {
// C++ lambda expression added inside sort function
sort(begin(nums), end(nums), [&](int x , int y){
return calculate(x, mapping) < calculate(y, mapping);
});
return nums;
}
};
My Doubt
Let me take the Example 1, where nums = [991,338,38] and it's corresponding mapped values = [669, 7, 7]. Here the expected output is [338,38,991] and my code is working correctly. My doubt is that why 338 is coming first in the output as my lambda expression only checking a < b condition. So whenever a < b is true, then a comes first in the sorted order otherwise b comes. According to my code what I think is that when calculate(338, mapping) < calculate(38, mapping) is called, it returns false right? So 38 should come first right? Can someone please explain how my code is doing the stable sort when elements have equal mapped values.

Does this problem have overlapping subproblems?

I am trying to solve this question on LeetCode.com:
You are given an m x n integer matrix mat and an integer target. Choose one integer from each row in the matrix such that the absolute difference between target and the sum of the chosen elements is minimized. Return the minimum absolute difference. (The absolute difference between two numbers a and b is the absolute value of a - b.)
So for input mat = [[1,2,3],[4,5,6],[7,8,9]], target = 13, the output should be 0 (since 1+5+7=13).
The solution I am referring is as below:
int dp[71][70 * 70 + 1] = {[0 ... 70][0 ... 70 * 70] = INT_MAX};
int dfs(vector<set<int>>& m, int i, int sum, int target) {
if (i >= m.size())
return abs(sum - target);
if (dp[i][sum] == INT_MAX) {
for (auto it = begin(m[i]); it != end(m[i]); ++it) {
dp[i][sum] = min(dp[i][sum], dfs(m, i + 1, sum + *it, target));
if (dp[i][sum] == 0 || sum + *it > target)
break;
}
} else {
// cout<<"Encountered a previous value!\n";
}
return dp[i][sum];
}
int minimizeTheDifference(vector<vector<int>>& mat, int target) {
vector<set<int>> m;
for (auto &row : mat)
m.push_back(set<int>(begin(row), end(row)));
return dfs(m, 0, 0, target);
}
I don't follow how this problem is solvable by dynamic programming. The states apparently are the row i and the sum (from row 0 to row i-1). Given that the problem constraints are:
m == mat.length
n == mat[i].length
1 <= m, n <= 70
1 <= mat[i][j] <= 70
1 <= target <= 800
My understanding is that we would never encounter a sum that we have previously encountered (all values are positive). Even the debug cout statement that I added does not print anything on the sample inputs given in the problem.
How could dynamic programming be applicable here?

This problem is NP-hard, since the 0-1 knapsack problem reduces to it pretty easily.
This problem also has a dynamic programming solution that is similar to the one for 0-1 knapsack:
Find all the sums you can make with a number from the first row (that's just the numbers in the first row):
For each subsequent row, add all the numbers from the ith row to all the previously accessible sums to find the sums you can get after i rows.
If you need to be able to recreate a path through the matrix, then for each sum at each level, remember the preceding one from the previous level.
There are indeed overlapping subproblems, because there will usually be multiple ways to get a lot of the sums, and you only have to remember and continue from one of them.
Here is your example:
sums from row 1: 1, 2, 3
sums from rows 1-2: 5, 6, 7, 8, 9
sums from rows 1-3: 12, 13, 14, 15, 16, 17, 18
As you see, we can make the target sum. There are a few ways:
7+4+2, 7+5+1, 8+4+1
Some targets like 15 have a lot more ways. As the size of the matrix increases, the amount of overlap tends to increase, and so this solutions is reasonably efficient in many cases. The total complexity is in O(M * N * max_weight).
But, this is an NP-hard problem, so this is not always tractable -- max_weight can grow exponentially with the size of the problem.

Perfect sum problem with fixed subset size

I am looking for a least time-complex algorithm that would solve a variant of the perfect sum problem (initially: finding all variable size subset combinations from an array [*] of integers of size n that sum to a specific number x) where the subset combination size is of a fixed size k and return the possible combinations without direct and also indirect (when there's a combination containing the exact same elements from another in another order) duplicates.
I'm aware this problem is NP-hard, so I am not expecting a perfect general solution but something that could at least run in a reasonable time in my case, with n close to 1000 and k around 10
Things I have tried so far:
Finding a combination, then doing successive modifications on it and its modifications
Let's assume I have an array such as:
s = [1,2,3,3,4,5,6,9]
So I have n = 8, and I'd like x = 10 for k = 3
I found thanks to some obscure method (bruteforce?) a subset [3,3,4]
From this subset I'm finding other possible combinations by taking two elements out of it and replacing them with other elements that sum the same, i.e. (3, 3) can be replaced by (1, 5) since both got the same sum and the replacing numbers are not already in use. So I obtain another subset [1,5,4], then I repeat the process for all the obtained subsets... indefinitely?
The main issue as suggested here is that it's hard to determine when it's done and this method is rather chaotic. I imagined some variants of this method but they really are work in progress
Iterating through the set to list all k long combinations that sum to x
Pretty self explanatory. This is a naive method that do not work well in my case since I have a pretty large n and a k that is not small enough to avoid a catastrophically big number of combinations (the magnitude of the number of combinations is 10^27!)
I experimented several mechanism related to setting an area of research instead of stupidly iterating through all possibilities, but it's rather complicated and still work in progress
What would you suggest? (Snippets can be in any language, but I prefer C++)
[*] To clear the doubt about whether or not the base collection can contain duplicates, I used the term "array" instead of "set" to be more precise. The collection can contain duplicate integers in my case and quite much, with 70 different integers for 1000 elements (counts rounded), for example

With reasonable sum limit this problem might be solved using extension of dynamic programming approach for subset sum problem or coin change problem with predetermined number of coins. Note that we can count all variants in pseudopolynomial time O(x*n), but output size might grow exponentially, so generation of all variants might be a problem.
Make 3d array, list or vector with outer dimension x-1 for example: A[][][]. Every element A[p] of this list contains list of possible subsets with sum p.
We can walk through all elements (call current element item) of initial "set" (I noticed repeating elements in your example, so it is not true set).
Now scan A[] list from the last entry to the beginning. (This trick helps to avoid repeating usage of the same item).
If A[i - item] contains subsets with size < k, we can add all these subsets to A[i] appending item.
After full scan A[x] will contain subsets of size k and less, having sum x, and we can filter only those of size k
Example of output of my quick-made Delphi program for the next data:
Lst := [1,2,3,3,4,5,6,7];
k := 3;
sum := 10;
3 3 4
2 3 5 //distinct 3's
2 3 5
1 4 5
1 3 6
1 3 6 //distinct 3's
1 2 7
To exclude variants with distinct repeated elements (if needed), we can use non-first occurence only for subsets already containing the first occurence of item (so 3 3 4 will be valid while the second 2 3 5 won't be generated)
I literally translate my Delphi code into C++ (weird, I think :)
int main()
{
vector<vector<vector<int>>> A;
vector<int> Lst = { 1, 2, 3, 3, 4, 5, 6, 7 };
int k = 3;
int sum = 10;
A.push_back({ {0} }); //fictive array to make non-empty variant
for (int i = 0; i < sum; i++)
A.push_back({{}});
for (int item : Lst) {
for (int i = sum; i >= item; i--) {
for (int j = 0; j < A[i - item].size(); j++)
if (A[i - item][j].size() < k + 1 &&
A[i - item][j].size() > 0) {
vector<int> t = A[i - item][j];
t.push_back(item);
A[i].push_back(t); //add new variant including current item
}
}
}
//output needed variants
for (int i = 0; i < A[sum].size(); i++)
if (A[sum][i].size() == k + 1) {
for (int j = 1; j < A[sum][i].size(); j++) //excluding fictive 0
cout << A[sum][i][j] << " ";
cout << endl;
}
}

Here is a complete solution in Python. Translation to C++ is left to the reader.
Like the usual subset sum, generation of the doubly linked summary of the solutions is pseudo-polynomial. It is O(count_values * distinct_sums * depths_of_sums). However actually iterating through them can be exponential. But using generators the way I did avoids using a lot of memory to generate that list, even if it can take a long time to run.
from collections import namedtuple
# This is a doubly linked list.
# (value, tail) will be one group of solutions. (next_answer) is another.
SumPath = namedtuple('SumPath', 'value tail next_answer')
def fixed_sum_paths (array, target, count):
# First find counts of values to handle duplications.
value_repeats = {}
for value in array:
if value in value_repeats:
value_repeats[value] += 1
else:
value_repeats[value] = 1
# paths[depth][x] will be all subsets of size depth that sum to x.
paths = [{} for i in range(count+1)]
# First we add the empty set.
paths[0][0] = SumPath(value=None, tail=None, next_answer=None)
# Now we start adding values to it.
for value, repeats in value_repeats.items():
# Reversed depth avoids seeing paths we will find using this value.
for depth in reversed(range(len(paths))):
for result, path in paths[depth].items():
for i in range(1, repeats+1):
if count < i + depth:
# Do not fill in too deep.
break
result += value
if result in paths[depth+i]:
path = SumPath(
value=value,
tail=path,
next_answer=paths[depth+i][result]
)
else:
path = SumPath(
value=value,
tail=path,
next_answer=None
)
paths[depth+i][result] = path
# Subtle bug fix, a path for value, value
# should not lead to value, other_value because
# we already inserted that first.
path = SumPath(
value=value,
tail=path.tail,
next_answer=None
)
return paths[count][target]
def path_iter(paths):
if paths.value is None:
# We are the tail
yield []
else:
while paths is not None:
value = paths.value
for answer in path_iter(paths.tail):
answer.append(value)
yield answer
paths = paths.next_answer
def fixed_sums (array, target, count):
paths = fixed_sum_paths(array, target, count)
return path_iter(paths)
for path in fixed_sums([1,2,3,3,4,5,6,9], 10, 3):
print(path)
Incidentally for your example, here are the solutions:
[1, 3, 6]
[1, 4, 5]
[2, 3, 5]
[3, 3, 4]

You should first sort the so called array. Secondly, you should determine if the problem is actually solvable, to save time... So what you do is you take the last k elements and see if the sum of those is larger or equal to the x value, if it is smaller, you are done it is not possible to do something like that.... If it is actually equal yes you are also done there is no other permutations.... O(n) feels nice doesn't it?? If it is larger, than you got a lot of work to do..... You need to store all the permutations in an seperate array.... Then you go ahead and replace the smallest of the k numbers with the smallest element in the array.... If this is still larger than x then you do it for the second and third and so on until you get something smaller than x. Once you reach a point where you have the sum smaller than x, you can go ahead and start to increase the value of the last position you stopped at until you hit x.... Once you hit x that is your combination.... Then you can go ahead and get the previous element so if you had 1,1,5, 6 in your thingy, you can go ahead and grab the 1 as well, add it to your smallest element, 5 to get 6, next you check, can you write this number 6 as a combination of two values, you stop once you hit the value.... Then you can repeat for the others as well.... You problem can be solved in O(n!) time in the worst case.... I would not suggest that you 10^27 combinations, meaning you have more than 10^27 elements, mhmmm bad idea do you even have that much space??? That's like 3bits for the header and 8 bits for each integer you would need 9.8765*10^25 terabytes just to store that clossal array, more memory than a supercomputer, you should worry about whether your computer can even store this monster rather than if you can solve the problem, that many combinations even if you find a quadratic solution it would crash your computer, and you know what quadratic is a long way off from O(n!)...

A brute force method using recursion might look like this...
For example, given variables set, x, k, the following pseudo code might work:
setSumStructure find(int[] set, int x, int k, int setIdx)
{
int sz = set.length - setIdx;
if (sz < x) return null;
if (sz == x) check sum of set[setIdx] -> set[set.size] == k. if it does, return the set together with the sum, else return null;
for (int i = setIdx; i < set.size - (k - 1); i++)
filter(find (set, x - set[i], k - 1, i + 1));
return filteredSets;
}

How to find all bitmask combinations given i subsequent values can be "0", j can be "1"?

Input.
I have a bit array sized n and two integers, 1<=i<=n and 0<=j<=n.
i indicates the maximum of subsequent numbers that can be 0. j indicates the maximum of subsequent numbers that can be 1.
Desired Output
I search for a method that returns all possible bit arrays sized n that fulfill these constraints.
Just looping through all array combinations (first without constraints) would result in exponential time. (Especially if i/j>>1. I suppose you can do better).
How can I effectively find those bitmask combinations?
Example
Input: i = 1, j = 2, n = 3
Result: Possible arrays are [0,1,0], [1,0,1],[1,1,0],[0,1,1].

This is nice problem for dynamic programming solution. It is enough to have method that returns number of strings starting with given digit (0 or 1) with given length. Than number of digits of length n is sum of strings starting with 0 and starting with 1.
Simple python solution with memoization is:
_c = {} # Cache
def C(d, n, ij):
if n <= 1:
return 1
if (d, n) not in _c:
_c[(d, n)] = sum(C(1-d, n-x, ij) for x in xrange(1, min(ij[d], n)+1))
return _c[(d, n)]
def B(n, i, j):
ij = [i, j] # Easier to index
_c.clear() # Clears cache
return C(0, n, ij) + C(1, n, ij)
print B(3, 1, 2)
print B(300, 10, 20)
Result is:
4
1896835555769011113367758506440713303464223490691007178590554687025004528364990337945924158
Since value for given digit and length depends on values of opposite digit and length less than given length, solution can be also obtained by calculating values increasingly by length. Python solution:
def D(n, i, j):
c0 = [1] # Initialize arrays
c1 = [1]
for x in xrange(1, n+1): # For each next digit calculate value
c0.append(sum(c1[x-y] for y in xrange(1, min(i, x)+1)))
c1.append(sum(c0[x-y] for y in xrange(1, min(j, x)+1)))
return c0[-1] + c1[-1] # Sum strings starting of length n with 0 and 1
print D(3, 1, 2)
print D(300, 10, 20)
Later approach is easier to implement in C++.

Find the smallest integer whose sum of squares of digits add to the given number

Example:
Input: | Output:
5 –> 12 (1^2 + 2^2 = 5)
500 -> 18888999 (1^2 + 8^2 + 8^2 + 8^2 + 9^2 + 9^2 + 9^2 = 500)
I have written a pretty simple brute-force solution, but it has big performance problems:
#include <iostream>
using namespace std;
int main() {
int n;
bool found = true;
unsigned long int sum = 0;
cin >> n;
int i = 0;
while (found) {
++i;
if (n == 0) { //The code below doesn't work if n = 0, so we assign value to sum right away (in case n = 0)
sum = 0;
break;
}
int j = i;
while (j != 0) { //After each iteration, j's last digit gets stripped away (j /= 10), so we want to stop right when j becomes 0
sum += (j % 10) * (j % 10); //After each iteration, sum gets increased by *(last digit of j)^2*. (j % 10) gets the last digit of j
j /= 10;
}
if (sum == n) { //If we meet our problem's requirements, so that sum of j's each digit squared is equal to the given number n, loop breaks and we get our result
break;
}
sum = 0; //Otherwise, sum gets nullified and the loops starts over
}
cout << i;
return 0;
}
I am looking for a fast solution to the problem.

Use dynamic programming. If we knew the first digit of the optimal solution, then the rest would be an optimal solution for the remainder of the sum. As a result, we can guess the first digit and use a cached computation for smaller targets to get the optimum.
def digitsum(n):
best = [0]
for i in range(1, n+1):
best.append(min(int(str(d) + str(best[i - d**2]).strip('0'))
for d in range(1, 10)
if i >= d**2))
return best[n]

Let's try and explain David's solution. I believe his assumption is that given an optimal solution, abcd..., the optimal solution for n - a^2 would be bcd..., therefore if we compute all the solutions from 1 to n, we can rely on previous solutions for numbers smaller than n as we try different subtractions.
So how can we interpret David's code?
(1) Place the solutions for the numbers 1 through n, in order, in the table best:
for i in range(1, n+1):
best.append(...
(2) the solution for the current query, i, is the minimum in an array of choices for different digits, d, between 1 and 9 if subtracting d^2 from i is feasible.
The minimum of the conversion to integers...
min(int(
...of the the string, d, concatenated with the string of the solution for n - d^2 previously recorded in the table (removing the concatenation of the solution for zero):
str(d) + str(best[i - d**2]).strip('0')
Let's modify the last line of David's code, to see an example of how the table works:
def digitsum(n):
best = [0]
for i in range(1, n+1):
best.append(min(int(str(d) + str(best[i - d**2]).strip('0'))
for d in range(1, 10)
if i >= d**2))
return best # original line was 'return best[n]'
We call, digitsum(10):
=> [0, 1, 11, 111, 2, 12, 112, 1112, 22, 3, 13]
When we get to i = 5, our choices for d are 1 and 2 so the array of choices is:
min([ int(str(1) + str(best[5 - 1])), int(str(2) + str(best[5 - 4])) ])
=> min([ int( '1' + '2' ), int( '2' + '1' ) ])
And so on and so forth.

So this is in fact a well known problem in disguise. The minimum coin change problem in which you are given a sum and requested to pay with minimum number of coins. Here instead of ones, nickels, dimes and quarters we have 81, 64, 49, 36, ... , 1 cents.
Apparently this is a typical example to encourage dynamic programming. In dynamic programming, unlike in recursive approach in which you are expected to go from top to bottom, you are now expected to go from bottom to up and "memoize" the results those will be required later. Thus... much faster..!
So ok here is my approach in JS. It's probably doing a very similar job to David's method.
function getMinNumber(n){
var sls = Array(n).fill(),
sct = [], max;
sls.map((_,i,a) => { max = Math.min(9,~~Math.sqrt(i+1)),
sct = [];
while (max) sct.push(a[i-max*max] ? a[i-max*max].concat(max--)
: [max--]);
a[i] = sct.reduce((p,c) => p.length < c.length ? p : c);
});
return sls[sls.length-1].reverse().join("");
}
console.log(getMinNumber(500));
What we are doing is from bottom to up generating a look up array called sls. This is where memoizing happens. Then starting from from 1 to n we are mapping the best result among several choices. For example if we are to look for 10's partitions we will start with the integer part of 10's square root which is 3 and keep it in the max variable. So 3 being one of the numbers the other should be 10-3*3 = 1. Then we look up for the previously solved 1 which is in fact [1] at sls[0] and concat 3 to sls[0]. And the result is [3,1]. Once we finish with 3 then one by one we start over the same job with one smaller, up until it's 1. So after 3 we check for 2 (result is [2,2,1,1]) and then for 1 (result is [1,1,1,1,1,1,1,1,1,1]) and compare the length of the results of 3, 2 and 1 for the shortest, which is [3,1] and store it at sls[9] (a.k.a a[i]) which is the place for 10 in our look up array.

(Edit) This answer is not correct. The greedy approach does not work for this problem -- sorry.
I'll give my solution in a language agnostic fashion, i.e. the algorithm.
I haven't tested but I believe this should do the trick, and the complexity is proportional to the number of digits in the output:
digitSquared(n) {
% compute the occurrences of each digit
numberOfDigits = [0 0 0 0 0 0 0 0 0]
for m from 9 to 1 {
numberOfDigits[m] = n / m*m;
n = n % m*m;
if (n==0)
exit loop;
}
% assemble the final output
output = 0
powerOfTen = 0
for m from 9 to 1 {
for i from 0 to numberOfDigits[m] {
output = output + m*10^powerOfTen
powerOfTen = powerOfTen + 1
}
}
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js